Bala's Blog

JOY OF PROGRAMMING

Category: hadoop map reduce

Setting the Hadoop Cluster (Multi Node Setup)

The following are the changes that need to be made inside hadoop folder.

1)core-site.xml

For MASTER:

changes :

The ip address of the master is given in place of the localhost

<property>

<name>fs.default.name</name>

<value>hdfs://10.229.152.18:10011</value>

</property>

SLAVE:

changes :

Similar to master, each replacing with their corresponding ip address.

3)hdfs-site.xml

MASTER:

// The vaule in the replication is the no of slaves + master

<configuration>

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

// Here the name node directory is specified

<property>

<name>dfs.name.dir</name>

<value>/home/user1/asl-hadoop-0.20.2+228/filesystem/name</value>

</property>

// data node directory path

<property>

<name>dfs.data.dir</name>

<value>/home/user1/asl-hadoop-0.20.2+228/filesystem/data</value>

</property>

// temporary directory path

<property>

<name>dfs.temp.dir</name>

<value>/home/user1/asl-hadoop-0.20.2+228/filesystem/temp</value>

</property>

</configuration>

SLAVES:

no changes default values

4)mapred-site.xml

MASTER:

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>10.229.152.18:10012</value>

</property>

//local path need to be given where the local directory is created automatically

<property>

<name>mapred.local.dir</name>

<value>/home/user1/asl-hadoop-0.20.2+228/local</value>

</property>

//Here no of task may be (no of slaves*10) which is rule of thumbs

<property>

<name>mapred.map.tasks</name>

<value>30</value>

</property>

//Here no of reduce tasks may be (no of slaves*3) which is rule of thumbs

<property>

<name>mapred.reduce.tasks</name>

<value>6</value>

</property>

SLAVES:

no changes default values

5)CONF / MASTERS AND SLAVES

MASTER:

conf/masters

master ip

conf/slaves

master_ip

slave_ip

SLAVE:

conf/masters

localhost

conf/slaves

localhost

{HADOOP_HOME}/bin/start-all.sh

After executing the start-all.sh the jps must look like

running node:

in master:

23763 TaskTracker

23186 NameNode

23603 JobTracker

23359 DataNode

In slave

3232 DataNode

6772 TaskTracker

SUCCESSFULLY COMPLETED HADOOP CLUSTER…

Running hadoop in eclipse

Hi folks

I have done some steps for running the hadoop map reduce program in eclipse

1)Make sure that the Eclipse and hadoop 0.20.2 is installed

2) Then follow the steps given below

i) Create a new Java Project in Eclipce and name it “hadoop-0.20.2″.
ii) Import the hadoop-0.20.2.tar.gz into the above project.
iii) Ant.jar must be imported into the library folder of the project.
iv) Rewrite eclipse/workspace/hadoop-0.20.2/.classpath to add source folders and necessary libraries into the project.

<?xml version="1.0" encoding="UTF-8"?>
<classpath>
	<classpathentry kind="src" path="hadoop-0.20.2/src/core"/>
	<classpathentry kind="src" path="hadoop-0.20.2/src/mapred"/>
	<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.6"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-logging-1.0.4.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/xmlenc-0.52.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-net-1.4.1.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/kfs-0.2.2.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jets3t-0.6.1.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jetty-6.1.14.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jetty-util-6.1.14.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-codec-1.3.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/log4j-1.2.15.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-cli-1.2.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/ant.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar"/>
	<classpathentry kind="output" path="bin"/>
</classpath>

v) Then refresh the project.

Then run the map reduce program by creating a new java project for e.g. Word Count

1) Then in build path add the eclipse project hadoop-0.20.2

2) Add hadoop-0.20.2/hadoop-0.20.2/lib/commons-cli-1.2.jar at Libraries tab

3) Then add commons-httpclient-3.1.jar and apache common logging commons-logging-1.0.4.jar

4)then create a file called log4j.properties inside the bin/wordcount/ for viewing the output as in the eclipse console

og4j.rootLogger=INFO,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n

5)Then we can execute the program wordcount.java in eclipse

Thanks
Balasundaram J K