Bala's Blog

JOY OF PROGRAMMING

Month: December, 2011

Setting the Hadoop Cluster (Multi Node Setup)

The following are the changes that need to be made inside hadoop folder.

1)core-site.xml

For MASTER:

changes :

The ip address of the master is given in place of the localhost

<property>

<name>fs.default.name</name>

<value>hdfs://10.229.152.18:10011</value>

</property>

SLAVE:

changes :

Similar to master, each replacing with their corresponding ip address.

3)hdfs-site.xml

MASTER:

// The vaule in the replication is the no of slaves + master

<configuration>

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

// Here the name node directory is specified

<property>

<name>dfs.name.dir</name>

<value>/home/user1/asl-hadoop-0.20.2+228/filesystem/name</value>

</property>

// data node directory path

<property>

<name>dfs.data.dir</name>

<value>/home/user1/asl-hadoop-0.20.2+228/filesystem/data</value>

</property>

// temporary directory path

<property>

<name>dfs.temp.dir</name>

<value>/home/user1/asl-hadoop-0.20.2+228/filesystem/temp</value>

</property>

</configuration>

SLAVES:

no changes default values

4)mapred-site.xml

MASTER:

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>10.229.152.18:10012</value>

</property>

//local path need to be given where the local directory is created automatically

<property>

<name>mapred.local.dir</name>

<value>/home/user1/asl-hadoop-0.20.2+228/local</value>

</property>

//Here no of task may be (no of slaves*10) which is rule of thumbs

<property>

<name>mapred.map.tasks</name>

<value>30</value>

</property>

//Here no of reduce tasks may be (no of slaves*3) which is rule of thumbs

<property>

<name>mapred.reduce.tasks</name>

<value>6</value>

</property>

SLAVES:

no changes default values

5)CONF / MASTERS AND SLAVES

MASTER:

conf/masters

master ip

conf/slaves

master_ip

slave_ip

SLAVE:

conf/masters

localhost

conf/slaves

localhost

{HADOOP_HOME}/bin/start-all.sh

After executing the start-all.sh the jps must look like

running node:

in master:

23763 TaskTracker

23186 NameNode

23603 JobTracker

23359 DataNode

In slave

3232 DataNode

6772 TaskTracker

SUCCESSFULLY COMPLETED HADOOP CLUSTER…

How to remove multiple deleted files in git repository?

First use

git add -u (removes all the files in deleted mode) or git add -A

After that we need to commit that as usual

git commit -m "Deleted files manually" 

This command commits all the deleted files

git push
After we push all the files will be deleted from the repository.

How to add & symbol in xml element value?

The xml doesnt support the “&” symbol when it is given as the value. To make it support it we nee to add

&amp;

if we add it takes it as “&” and doesnt throw the error

 

Thanks

Bala

Running hadoop in eclipse

Hi folks

I have done some steps for running the hadoop map reduce program in eclipse

1)Make sure that the Eclipse and hadoop 0.20.2 is installed

2) Then follow the steps given below

i) Create a new Java Project in Eclipce and name it “hadoop-0.20.2″.
ii) Import the hadoop-0.20.2.tar.gz into the above project.
iii) Ant.jar must be imported into the library folder of the project.
iv) Rewrite eclipse/workspace/hadoop-0.20.2/.classpath to add source folders and necessary libraries into the project.

<?xml version="1.0" encoding="UTF-8"?>
<classpath>
	<classpathentry kind="src" path="hadoop-0.20.2/src/core"/>
	<classpathentry kind="src" path="hadoop-0.20.2/src/mapred"/>
	<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.6"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-logging-1.0.4.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/xmlenc-0.52.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-net-1.4.1.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/kfs-0.2.2.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jets3t-0.6.1.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jetty-6.1.14.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jetty-util-6.1.14.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-codec-1.3.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/log4j-1.2.15.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-cli-1.2.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/ant.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar"/>
	<classpathentry kind="output" path="bin"/>
</classpath>

v) Then refresh the project.

Then run the map reduce program by creating a new java project for e.g. Word Count

1) Then in build path add the eclipse project hadoop-0.20.2

2) Add hadoop-0.20.2/hadoop-0.20.2/lib/commons-cli-1.2.jar at Libraries tab

3) Then add commons-httpclient-3.1.jar and apache common logging commons-logging-1.0.4.jar

4)then create a file called log4j.properties inside the bin/wordcount/ for viewing the output as in the eclipse console

og4j.rootLogger=INFO,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n

5)Then we can execute the program wordcount.java in eclipse

Thanks
Balasundaram J K