Compatibility Requirements

S.No	Category	Supported
1	Languages	Java, Python, Perl, Ruby etc.
2	Operating System	Linux (Server Deployment) Mostly preferred, Windows (Development only), Solaris.
3	Hardware	32 bit Linux ( 64 bit for large deployment )

Installation Items

S.No

Item

Version

1

jdk-6u25-linux-i586.bin

Java 1.6 or higher

2

hadoop-0.20.2-cdh3u0.tar.gz

Hadoop 0.20.2

Note: Both Items are required to be installed on Namenode and Datanode machines

Installation Requirements

S.No	Requirement	Reason
1	Operating system – Linux recommended for server deployment (Production env.)
2	Language – Java 1.6 or higher
3	Ram – at least 3 GB/node
4	Hard disk – at least 1 TB	For namenode machine.
5	Should have root credentials	For changing some system files you need admin permissions.

High level Steps

Step #	Activity	Check
1	Binding IP address with the host name under /etc/hosts
2	Setting passwordless SSH
3	Installing Java
4	Installing Hadoop
5	Setting JAVA HOME and HADOOP HOME variables
6	Updating .bash_profile file for hadoop
7	Creating required folders for namenode and datanode
8	Configuring the .xml files
9	Setting the masters and slaves in all the machines
10	Formatting the namenode
11	Starting the Dfs services and mapred services
12	Stopping all services

Binding IP address with the host names

Before starting the installation of hadoop, first you need to bind the IP address of the machines along with their host names under /etc/hosts file.

First check the hostname of your machine by using following command :

$ hostname

Open /etc/hosts file for binding IP with the hostname

$ vi /etc/hosts

Provide ip & hostname of the all the machines in the cluster

e.g: 10.11.22.33 hostname1

10.11.22.34 hostname2

Setting Passwordless SSh login

SSH is used to login from one system to another without requiring passwords. This will be required when you run a cluster, it will not prompt you for the password again and again.

First log in on Host1 (hostname of namenode machine) as hadoop user and generate a pair of authentication keys. Command is:

hadoop@Host1$ ssh-keygen –t rsa

Note: Give the hostname which you got in step 5.3.1. Do not enter any passphrase if asked.

Now use ssh to create a directory ~/.ssh as user hadoop on Host2 (Hostname other than namenode machine).

hadoop@Host1$ ssh hadoop@Host2 mkdir –p .ssh

hadoop@Host2’s password:

Finally append Host1's new public key to hadoop@Host2: .ssh/authorized_keys and enter Host2's password one last time:

hadoop@Host1$ cat /home/hadoop/.ssh/id_rsa.pub | ssh hadoop@Host2 ‘ cat >> .ssh/authorized_keys ’

hadoop@Host2’s password:

From now on you can log into Host2 as hadoop from Host1 without password:

hadoop@Host1$ ssh hadoop@Host2

Host2@hadoop$

NOTE: Do the following changes:

· Change the permissions of .ssh to 700

· Change the permissions of .ssh/authorized_keys to 640

Prepare for installation

Check for previous installed versions of java and hadoop on your machine

$ rpm –qa | grep java

It will display fully qualified paths of the version installed.

Remove all the previous version of Java and Hadoop installed on the machine.

$ rpm –e softwarename or path-name

NOTE: All the installations and extractions are being done in /home/hadoop/

Installing Java

Use the JDK bin file (jdk-6u25-linux-i586.bin) for installing java on your machine . Copy the .bin file in /home/hadoop/

Execute the command “./jdk-6u25-linux-i586.bin” in /home/hadoop/ (which will unzip the contents into folder jdk1.6.0_25)

Extract the hadoop package

Syntax

$ tar –xzvf < hadoop-tar-package>

$ tar –xzvf hadoop-0.20.0-cdh3u9.tar.gz

Configuring HADOOP_HOME

Check whether HADOOP_HOME is set up to the folder containing hadoop_core_VERSION.jar using

$ echo $HADOOP_HOME

If not set then set it

$ export HADOOP_HOME=/home/hadoop/hadoop-version

For e.g.

$ cd /home/hadoop/

$ export HADDOP_HOME= /home/hadoop/hadoop-0.20.2-cdh3u0/

Setting JAVA_HOME

$ cd /home/hadoop/hadoop-0.20.0-cdh3u0/conf/

$ vi hadoop-env.sh

hadoop-env.sh file for setting JAVA_HOME

Press :wq to save and exit the file

You need to change the bash file also .

$ vi ~/.bash_profile

bash_profile file for setting environment variables and jar files

Check for hadoop installation confirmation

Run hadoop command to confirm whether the installation is successful.

$ cd <hadoop-home-directory>

Standard Path

$ cd /home/hadoop/ hadoop-0.20.0-cdh3u0/

$ bin/hadoop

On successful installation you should get the following message.

CONFIGURING HADOOP IN FULLY DISTRIBUTED MODE

Create the dfs.name.dir local directories on namenode machine

$ cd /home/hadoop/

$ mkdir -p data/1/dfs/nn

Creating the directories for storing the Data blocks and the temporary directory for storing process ids on datanode machines

$ cd /home/hadoop/

$ mkdir –p data/1/dfs/dn data/2/dfs/dn data/3/dfs/dn

$ mkdir –p /home/hadoop/ tmp

Creating the directories for storing the temporary data (Task Tracker) and the system files for Map/Reduce jobs

$ cd /home/hadoop/

$ mkdir –p data/1/mapred/local data/2/mapred/local data/3/mapred/local

$ mkdir –p /home/hadoop/mapred/system

Give full permission to all folder under /home/hadoop/

$ cd /home/hadoop/

$ chmod 777 *

Navigate to /home/hadoop/hadoop-0.20.0-cdh3u0/conf directory

$ cd /home/hadoop/hadoop-0.20.0-cdh3u0/conf

Set up the configuration files under /home/hadoop/hadoop-0.20.0-cdh3u0/conf/

Core-Site.xml

$ vi core-site.xml

Parameters of core-site.xml

fs.default.name à URL for the Name Node

hadoop.tmp.dirà URL for the temporary data

dfs.replicationà This property specifies the number of times the file has to be replicated on cluster.

core-site.xml

hdfs-site.xml

$ vi hdfs-site.xml

Parameters of hdfs-site.xml

dfs.name.dirà This property specifies the directories where the NameNode stores its metadata and edit logs. Represented by the /home/hadoop/data/1/dfs/nn path examples.

dfs.data.dirà This property specifies the directories where the DataNode stores blocks. Represented by the /home/hadoop/data/1/dfs/dn, /home/hadoop/data/2/dfs/dn , /home/hadoop/data/3/dfs/dn

hdfs-site.xml

Press :wq to save and exit the file

mapred-site.xml

$ vi mapred-site.xml

Parameters of mapred-site.xml

mapred.local.dir à This property specifies the directories where the TaskTracker will store temporary data and intermediate map output files while running Map Reduce jobs.

Eg./home/hadoop/data/1/mapred/local,/home/hadoop/data/2/mapred/local, /home/hadoop/data/3/mapred/local.

mapred.system.dir àPath on the HDFS where the Map Reduce framework stores system files e.g. /home/hadoop/mapred/system/.

mapred.job.tracker à Host or IP and port of Job Tracker.

mapred-site.xml

Press :wq to save and exit the file

Set the correct owner and permissions of the local directories:

Directory	Owner	Permissions
dfs.name.dir	hdfs:hadoop	drwx------
dfs.data.dir	hdfs:hadoop	drwx------
mapred.local.dir	mapred:hadoop	drwxr-xr-x

$ chmod 700 /home/hadoop/data/1/dfs/nn/

$ chmod 700 /home/hadoop/data/1/dfs/dn/ /home/hadoop/data/2/dfs/dn/ /home/hadoop/data/3/dfs/dn/

$ chmod 755 /home/hadoop/data/1/mapred/local/ /home/hadoop/data/2/mapred/local/ /home/hadoop/data/3/mapred/local/

Setting up the masters and slaves

vi conf/masters

hostname of machine acting as a SecondaryNamenode

vi slaves

hostname of machines acting as a Datanode & TaskTrackers

Formatting the namenode

You need to format the namenode every time you start the dfs services. This is because every time you start the services it causes some files to be written in the namenode folder which may get duplicated when you run the services for the second time. Do not format a running Hadoop namenode, otherwise it will cause all your data in the HDFS filesytem to be erased.

$ cd /home/hadoop/hadoop-0.20.0-cdh3u0/

$ bin/hadoop namenode -format

Note : Give “Y” when it asks for re-format

Starting dfs service

Run the command

$ /bin/start-dfs.sh on the machine you want the namenode to run on. This will bring up HDFS with the namenode running on the machine you ran the previous command on, and datanodes on the machines listed in the conf/slaves file.

$ cd /home/hadoop/hadoop-0.20.0-cdh3u0/

$ ./bin/start-dfs.sh

NOTE: For any problems check the log files under all the machines under /home/hadoop/hadoop-0.20.2-cdh3u0/logs/ and refer to the troubleshooting guide for the same.

Starting mapred service

For mapred services run the following command on the machine you want jobtracker to run on (in my case it was namenode machine)

you can choose other machine also.

$ ./bin/start-mapred.sh

Checking the DFS service report

Checking the services are up and running on the cmd mode

$ ./bin/hadoop dfsadmin –report

Checking on web interface DFS SERVICE

http://ip-address of namenode machine:50070/

Checking on web interface Mapred Job

http://ip of namenode machine:50030/

Stopping dfs and mapred services

$ cd /home/hadoop/hadoop-0.20.0-cdh3u0/

$ ./bin/stop-all.sh

BigData with Hadoop

Thursday, 8 November 2012

Hadoop installation in fully distributed mode

Installation Items

S.No

Item

Version

1

jdk-6u25-linux-i586.bin

Java 1.6 or higher

2

hadoop-0.20.2-cdh3u0.tar.gz

Hadoop 0.20.2

Installation Requirements

High level Steps

No comments:

Post a Comment

S.No	Item	Version
1	jdk-6u25-linux-i586.bin	Java 1.6 or higher
2	hadoop-0.20.2-cdh3u0.tar.gz	Hadoop 0.20.2

Thursday, 8 November 2012

Hadoop installation in fully distributed mode

Installation Items

S.No Item Version 1 jdk-6u25-linux-i586.bin Java 1.6 or higher 2 hadoop-0.20.2-cdh3u0.tar.gz Hadoop 0.20.2

Installation Requirements

High level Steps

No comments:

Post a Comment

S.No

Item

Version

1

jdk-6u25-linux-i586.bin

Java 1.6 or higher

2

hadoop-0.20.2-cdh3u0.tar.gz

Hadoop 0.20.2