Difference between revisions of "Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86 64 Desktop"

Latest revision as of 05:43, 15 February 2023

Home > Ubuntu > Hadoop cluster setup > Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86_64 Desktop

Installation steps

Install java as mentioned at Installing Java on Ubuntu 12.04 x86_64 Desktop

Create user account and group for hadoop using:

sudo groupadd hadoop
sudo useradd hadoop -b /home -g hadoop -mkU -s /bin/bash
cd /home/hadoop
sudo cp -rp /etc/skel/.[^.]* .
sudo chown -R hadoop:hadoop .
sudo chmod -R o-rwx .

Note -m in useradd should be specified before -k.

Install openssh server using:
```
sudo apt-get -y install openssh-server
```

Setup password-less ssh for hadoop user:

sudo su - hadoop
ssh-keygen
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
#To test configuration, should echo hadoop
ssh hadoop@localhost "echo $USER"
exit

Download hadoop source from one of the mirrors linked at https://www.apache.org/dyn/closer.cgi/hadoop/common/ Download the latest stable .tar.gz release from stable folder. (Ex hadoop-2.2.0.tar.gz)

Extract hadoop sources in /opt/hadoop and make hadoop:hadoop its owner:

sudo mkdir /opt/hadoop
cd /opt/hadoop/
sudo tar xzf <path-to-hadoop-source>
sudo mv hadoop-2.2.0 hadoop
sudo chown -R hadoop:hadoop .

Configure hadoop single-node setup using:

Login as user hadoop:
```
sudo su - hadoop
```

Edit '~/.bashrc' and append

export JAVA_HOME=/opt/jdk1.7.0_40
export HADOOP_INSTALL=/opt/hadoop/hadoop
export HADOOP_PREFIX=/opt/hadoop/hadoop
export HADOOP_HOME=/opt/hadoop/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

Change folder to /opt/hadoop/hadoop/etc/hadoop
Edit 'hadoop-env.sh' and set proper value for JAVA_HOME such as '/opt/jdk1.7.0_40'. Do not leave it as ${JAVA_HOME} as that does not works.
Edit '/opt/hadoop/hadoop/libexec/hadoop-config.sh' and prepend following line at start of script:
```
export JAVA_HOME=/opt/jdk1.7.0_40
```
Exit from hadoop user and relogin using 'sudo su - hadoop'. Check hadoop version using 'hadoop version' command.
Again change folder to /opt/hadoop/hadoop/etc/hadoop
Use 'mkdir /opt/hadoop/tmp'

Edit 'core-site.xml' and add following between <configuration> and </configuration>:

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
</property>

Edit 'yarn-site.xml' and add following between <configuration> and </configuration>:

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

yarn.nodemanager.aux-services.mapreduce.shuffle.class should probably be yarn.nodemanager.aux-services.mapreduce_shuffle.class Need to verify this

Use 'cp mapred-site.xml.template mapred-site.xml'
Edit 'mapred-site.xml' and add following between <configuration> and </configuration> tags:
```
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
```

Setup folders for HDFS using:

cd ~
mkdir -p mydata/hdfs/namenode
mkdir -p mydata/hdfs/datanode
cd /opt/hadoop/hadoop/etc/hadoop

Edit 'hdfs-site.xml' and put following values between <configuration> and </configuration>:

<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/mydata/hdfs/datanode</value>
</property>

Format namenode using 'hdfs namenode -format'
Start dfs and yarn using 'start-dfs.sh' and 'start-yarn.sh'.

Test using 'jps', you should see following services running:

18098 Jps
17813 NodeManager
17189 DataNode
16950 NameNode
17462 SecondaryNameNode
17599 ResourceManager

Access NameNode at http://localhost:50070 and ResourceManager at http://localhost:8088

Run sample map reduce job using:

cd /opt/hadoop/hadoop
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5

Verify by using ResourceManager UI that task finishes successfully.

Some steps learned from http://codesfusion.blogspot.in/2013/10/setup-hadoop-2x-220-on-ubuntu.html?m=1 and https://gist.github.com/ruo91/7154697

Running a word count example on single-node hadoop installation

Use following steps:

      cd
      wget http://www.gutenberg.org/cache/epub/132/pg132.txt
      #Assuming all services are running, use jps to verify
      jps
      hadoop dfs -mkdir /user/hadoop/input
      hadoop dfs -copyFromLocal pg132.txt /user/hadoop/input/pg132.txt
      cd /opt/hadoop/hadoop 
      hadoop jar \
      ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar \
      wordcount /user/hadoop/input/pg132.txt /user/hadoop/output/wordcount
      hadoop dfs -cat /user/hadoop/output/wordcount/p* | less

To stop running hadoop daemons use:

      stop-yarn.sh
      stop-dfs.sh

Steps learned from https://giraph.apache.org/quick_start.html

Home > Ubuntu > Hadoop cluster setup > Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86_64 Desktop

@@ Line 1: / Line 1: @@
-<yambe:breadcrumb>Ubuntu|Ubuntu</yambe:breadcrumb>
+[[Main Page|Home]] > [[Ubuntu]] > [[Ubuntu Hadoop cluster setup|Hadoop cluster setup]] > [[Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86_64 Desktop]]
-=Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86_64 Desktop=
 ==Installation steps==
@@ Line 9: / Line 8: @@
 #::      sudo useradd hadoop -b /home -g hadoop -mkU -s /bin/bash
 #::      cd /home/hadoop
-#::      cp -rp /etc/skel/.[^.]* .
+#::      sudo cp -rp /etc/skel/.[^.]* .
-#::      chown -R hadoop:hadoop .
+#::      sudo chown -R hadoop:hadoop .
-#::      chmod -R o-rwx .
+#::      sudo chmod -R o-rwx .
 #:</pre>
 #::Note -m in useradd should be specified before -k.
@@ Line 38: / Line 37: @@
 #:</pre>
 # Configure hadoop single-node setup using:
-## Login as user hadoop and change pwd to /opt/hadoop/hadoop using:
+## Login as user hadoop:
 ##:<pre>
 ##::         sudo su - hadoop
-##::         cd /opt/hadoop/hadoop
 ##:</pre>
 ## Edit '<tt>~/.bashrc</tt>' and append
@@ Line 48: / Line 46: @@
 ##::	 export HADOOP_INSTALL=/opt/hadoop/hadoop
 ##::	 export HADOOP_PREFIX=/opt/hadoop/hadoop
+##::	 export HADOOP_HOME=/opt/hadoop/hadoop
 ##::	 export PATH=$PATH:$HADOOP_INSTALL/bin
 ##::	 export PATH=$PATH:$HADOOP_INSTALL/sbin
@@ Line 56: / Line 55: @@
 ##::	 export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
 ##::	 export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
+##::	 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
+##::	 export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
 ##:</pre>
 ## Change folder to /opt/hadoop/hadoop/etc/hadoop
 ## Edit '<tt>hadoop-env.sh</tt>' and set proper value for JAVA_HOME such as '<tt>/opt/jdk1.7.0_40</tt>'.  Do not leave it as ${JAVA_HOME} as that does not works.
+## Edit '<tt>/opt/hadoop/hadoop/libexec/hadoop-config.sh</tt>' and prepend following line at start of script:
+##:<pre>
+##::export JAVA_HOME=/opt/jdk1.7.0_40
+##:</pre>
 ## Exit from hadoop user and relogin using '<tt>sudo su - hadoop</tt>'.  Check hadoop version using '<tt>hadoop version</tt>' command.
 ## Again change folder to /opt/hadoop/hadoop/etc/hadoop
+## Use 'mkdir /opt/hadoop/tmp'
 ## Edit '<tt>core-site.xml</tt>' and add following between <configuration> and </configuration>:
 ##:<pre>
@@ Line 66: / Line 72: @@
 ##::	   <name>fs.default.name</name>
 ##::	   <value>hdfs://localhost:9000</value>
+##::	 </property>
+##::	 <property>
+##::	   <name>hadoop.tmp.dir</name>
+##::	   <value>/opt/hadoop/tmp</value>
 ##::	 </property>
 ##:</pre>
@@ Line 79: / Line 89: @@
 ##::	 </property>
 ##:</pre>
+##::yarn.nodemanager.aux-services.mapreduce.shuffle.class should probably be yarn.nodemanager.aux-services.mapreduce_shuffle.class  ''Need to verify this''
 ## Use '<tt>cp mapred-site.xml.template mapred-site.xml</tt>'
 ## Edit '<tt>mapred-site.xml</tt>' and add following between <configuration> and </configuration> tags:
@@ Line 102: / Line 113: @@
 ##::	  <property>
 ##::	    <name>dfs.namenode.name.dir</name>
-##::	    <value>file:/home/hduser/mydata/hdfs/namenode</value>
+##::	    <value>file:/home/hadoop/mydata/hdfs/namenode</value>
 ##::	  </property>
 ##::	  <property>
 ##::	    <name>dfs.datanode.data.dir</name>
-##::	    <value>file:/home/hduser/mydata/hdfs/datanode</value>
+##::	    <value>file:/home/hadoop/mydata/hdfs/datanode</value>
 ##::	  </property>
 ##:</pre>
@@ Line 158: / Line 169: @@
-<yambe:breadcrumb>Ubuntu|Ubuntu</yambe:breadcrumb>
+[[Main Page|Home]] > [[Ubuntu]] > [[Ubuntu Hadoop cluster setup|Hadoop cluster setup]] > [[Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86_64 Desktop]]

Anonymous

Search

Difference between revisions of "Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86 64 Desktop"

Namespaces

More

Page actions

Latest revision as of 05:43, 15 February 2023

Installation steps

Running a word count example on single-node hadoop installation

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Difference between revisions of "Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86 64 Desktop"

Latest revision as of 05:43, 15 February 2023

Installation steps

Running a word count example on single-node hadoop installation

Navigation

Wiki tools

Page tools