Difference between revisions of "Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86 64 Desktop"
From Notes_Wiki
(Created page with "<yambe:breadcrumb>Ubuntu|Ubuntu</yambe:breadcrumb> =Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86_64 Desktop= ==Installation steps== # Install java as mentioned at [[Ins...") |
m |
||
(10 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
[[Main Page|Home]] > [[Ubuntu]] > [[Ubuntu Hadoop cluster setup|Hadoop cluster setup]] > [[Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86_64 Desktop]] | |||
==Installation steps== | ==Installation steps== | ||
Line 9: | Line 8: | ||
#:: sudo useradd hadoop -b /home -g hadoop -mkU -s /bin/bash | #:: sudo useradd hadoop -b /home -g hadoop -mkU -s /bin/bash | ||
#:: cd /home/hadoop | #:: cd /home/hadoop | ||
#:: cp -rp /etc/skel/.[^.]* . | #:: sudo cp -rp /etc/skel/.[^.]* . | ||
#:: chown -R hadoop:hadoop . | #:: sudo chown -R hadoop:hadoop . | ||
#:: chmod -R o-rwx . | #:: sudo chmod -R o-rwx . | ||
#:</pre> | #:</pre> | ||
#::Note -m in useradd should be specified before -k. | #::Note -m in useradd should be specified before -k. | ||
Line 38: | Line 37: | ||
#:</pre> | #:</pre> | ||
# Configure hadoop single-node setup using: | # Configure hadoop single-node setup using: | ||
## Login as user hadoop | ## Login as user hadoop: | ||
##:<pre> | ##:<pre> | ||
##:: sudo su - hadoop | ##:: sudo su - hadoop | ||
##:</pre> | ##:</pre> | ||
## Edit '<tt>~/.bashrc</tt>' and append | ## Edit '<tt>~/.bashrc</tt>' and append | ||
Line 48: | Line 46: | ||
##:: export HADOOP_INSTALL=/opt/hadoop/hadoop | ##:: export HADOOP_INSTALL=/opt/hadoop/hadoop | ||
##:: export HADOOP_PREFIX=/opt/hadoop/hadoop | ##:: export HADOOP_PREFIX=/opt/hadoop/hadoop | ||
##:: export HADOOP_HOME=/opt/hadoop/hadoop | |||
##:: export PATH=$PATH:$HADOOP_INSTALL/bin | ##:: export PATH=$PATH:$HADOOP_INSTALL/bin | ||
##:: export PATH=$PATH:$HADOOP_INSTALL/sbin | ##:: export PATH=$PATH:$HADOOP_INSTALL/sbin | ||
Line 56: | Line 55: | ||
##:: export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native | ##:: export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native | ||
##:: export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib" | ##:: export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib" | ||
##:: export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop | |||
##:: export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop | |||
##:</pre> | ##:</pre> | ||
## Change folder to /opt/hadoop/hadoop/etc/hadoop | ## Change folder to /opt/hadoop/hadoop/etc/hadoop | ||
## Edit '<tt>hadoop-env.sh</tt>' and set proper value for JAVA_HOME such as '<tt>/opt/jdk1.7.0_40</tt>'. Do not leave it as ${JAVA_HOME} as that does not works. | ## Edit '<tt>hadoop-env.sh</tt>' and set proper value for JAVA_HOME such as '<tt>/opt/jdk1.7.0_40</tt>'. Do not leave it as ${JAVA_HOME} as that does not works. | ||
## Edit '<tt>/opt/hadoop/hadoop/libexec/hadoop-config.sh</tt>' and prepend following line at start of script: | |||
##:<pre> | |||
##::export JAVA_HOME=/opt/jdk1.7.0_40 | |||
##:</pre> | |||
## Exit from hadoop user and relogin using '<tt>sudo su - hadoop</tt>'. Check hadoop version using '<tt>hadoop version</tt>' command. | ## Exit from hadoop user and relogin using '<tt>sudo su - hadoop</tt>'. Check hadoop version using '<tt>hadoop version</tt>' command. | ||
## Again change folder to /opt/hadoop/hadoop/etc/hadoop | ## Again change folder to /opt/hadoop/hadoop/etc/hadoop | ||
## Use 'mkdir /opt/hadoop/tmp' | |||
## Edit '<tt>core-site.xml</tt>' and add following between <configuration> and </configuration>: | ## Edit '<tt>core-site.xml</tt>' and add following between <configuration> and </configuration>: | ||
##:<pre> | ##:<pre> | ||
Line 66: | Line 72: | ||
##:: <name>fs.default.name</name> | ##:: <name>fs.default.name</name> | ||
##:: <value>hdfs://localhost:9000</value> | ##:: <value>hdfs://localhost:9000</value> | ||
##:: </property> | |||
##:: <property> | |||
##:: <name>hadoop.tmp.dir</name> | |||
##:: <value>/opt/hadoop/tmp</value> | |||
##:: </property> | ##:: </property> | ||
##:</pre> | ##:</pre> | ||
Line 79: | Line 89: | ||
##:: </property> | ##:: </property> | ||
##:</pre> | ##:</pre> | ||
##::yarn.nodemanager.aux-services.mapreduce.shuffle.class should probably be yarn.nodemanager.aux-services.mapreduce_shuffle.class ''Need to verify this'' | |||
## Use '<tt>cp mapred-site.xml.template mapred-site.xml</tt>' | ## Use '<tt>cp mapred-site.xml.template mapred-site.xml</tt>' | ||
## Edit '<tt>mapred-site.xml</tt>' and add following between <configuration> and </configuration> tags: | ## Edit '<tt>mapred-site.xml</tt>' and add following between <configuration> and </configuration> tags: | ||
Line 102: | Line 113: | ||
##:: <property> | ##:: <property> | ||
##:: <name>dfs.namenode.name.dir</name> | ##:: <name>dfs.namenode.name.dir</name> | ||
##:: <value>file:/home/ | ##:: <value>file:/home/hadoop/mydata/hdfs/namenode</value> | ||
##:: </property> | ##:: </property> | ||
##:: <property> | ##:: <property> | ||
##:: <name>dfs.datanode.data.dir</name> | ##:: <name>dfs.datanode.data.dir</name> | ||
##:: <value>file:/home/ | ##:: <value>file:/home/hadoop/mydata/hdfs/datanode</value> | ||
##:: </property> | ##:: </property> | ||
##:</pre> | ##:</pre> | ||
Line 158: | Line 169: | ||
[[Main Page|Home]] > [[Ubuntu]] > [[Ubuntu Hadoop cluster setup|Hadoop cluster setup]] > [[Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86_64 Desktop]] |
Latest revision as of 05:43, 15 February 2023
Home > Ubuntu > Hadoop cluster setup > Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86_64 Desktop
Installation steps
- Install java as mentioned at Installing Java on Ubuntu 12.04 x86_64 Desktop
- Create user account and group for hadoop using:
- sudo groupadd hadoop
- sudo useradd hadoop -b /home -g hadoop -mkU -s /bin/bash
- cd /home/hadoop
- sudo cp -rp /etc/skel/.[^.]* .
- sudo chown -R hadoop:hadoop .
- sudo chmod -R o-rwx .
-
- Note -m in useradd should be specified before -k.
- Install openssh server using:
- sudo apt-get -y install openssh-server
- Setup password-less ssh for hadoop user:
- sudo su - hadoop
- ssh-keygen
- cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- chmod 0600 ~/.ssh/authorized_keys
- #To test configuration, should echo hadoop
- ssh hadoop@localhost "echo $USER"
- exit
- Download hadoop source from one of the mirrors linked at https://www.apache.org/dyn/closer.cgi/hadoop/common/ Download the latest stable .tar.gz release from stable folder. (Ex hadoop-2.2.0.tar.gz)
- Extract hadoop sources in /opt/hadoop and make hadoop:hadoop its owner:
- sudo mkdir /opt/hadoop
- cd /opt/hadoop/
- sudo tar xzf <path-to-hadoop-source>
- sudo mv hadoop-2.2.0 hadoop
- sudo chown -R hadoop:hadoop .
- Configure hadoop single-node setup using:
- Login as user hadoop:
- sudo su - hadoop
- Edit '~/.bashrc' and append
- export JAVA_HOME=/opt/jdk1.7.0_40
- export HADOOP_INSTALL=/opt/hadoop/hadoop
- export HADOOP_PREFIX=/opt/hadoop/hadoop
- export HADOOP_HOME=/opt/hadoop/hadoop
- export PATH=$PATH:$HADOOP_INSTALL/bin
- export PATH=$PATH:$HADOOP_INSTALL/sbin
- export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
- export HADOOP_COMMON_HOME=$HADOOP_INSTALL
- export HADOOP_HDFS_HOME=$HADOOP_INSTALL
- export YARN_HOME=$HADOOP_INSTALL
- export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
- export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
- export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
- export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
- Change folder to /opt/hadoop/hadoop/etc/hadoop
- Edit 'hadoop-env.sh' and set proper value for JAVA_HOME such as '/opt/jdk1.7.0_40'. Do not leave it as ${JAVA_HOME} as that does not works.
- Edit '/opt/hadoop/hadoop/libexec/hadoop-config.sh' and prepend following line at start of script:
- export JAVA_HOME=/opt/jdk1.7.0_40
- Exit from hadoop user and relogin using 'sudo su - hadoop'. Check hadoop version using 'hadoop version' command.
- Again change folder to /opt/hadoop/hadoop/etc/hadoop
- Use 'mkdir /opt/hadoop/tmp'
- Edit 'core-site.xml' and add following between <configuration> and </configuration>:
- <property>
- <name>fs.default.name</name>
- <value>hdfs://localhost:9000</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/opt/hadoop/tmp</value>
- </property>
- Edit 'yarn-site.xml' and add following between <configuration> and </configuration>:
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
- <property>
- <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
- <value>org.apache.hadoop.mapred.ShuffleHandler</value>
- </property>
-
- yarn.nodemanager.aux-services.mapreduce.shuffle.class should probably be yarn.nodemanager.aux-services.mapreduce_shuffle.class Need to verify this
- Use 'cp mapred-site.xml.template mapred-site.xml'
- Edit 'mapred-site.xml' and add following between <configuration> and </configuration> tags:
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
- Setup folders for HDFS using:
- cd ~
- mkdir -p mydata/hdfs/namenode
- mkdir -p mydata/hdfs/datanode
- cd /opt/hadoop/hadoop/etc/hadoop
- Edit 'hdfs-site.xml' and put following values between <configuration> and </configuration>:
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- <property>
- <name>dfs.namenode.name.dir</name>
- <value>file:/home/hadoop/mydata/hdfs/namenode</value>
- </property>
- <property>
- <name>dfs.datanode.data.dir</name>
- <value>file:/home/hadoop/mydata/hdfs/datanode</value>
- </property>
- Login as user hadoop:
- Format namenode using 'hdfs namenode -format'
- Start dfs and yarn using 'start-dfs.sh' and 'start-yarn.sh'.
- Test using 'jps', you should see following services running:
- 18098 Jps
- 17813 NodeManager
- 17189 DataNode
- 16950 NameNode
- 17462 SecondaryNameNode
- 17599 ResourceManager
- Access NameNode at http://localhost:50070 and ResourceManager at http://localhost:8088
- Run sample map reduce job using:
- cd /opt/hadoop/hadoop
- hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5
-
- Verify by using ResourceManager UI that task finishes successfully.
Some steps learned from http://codesfusion.blogspot.in/2013/10/setup-hadoop-2x-220-on-ubuntu.html?m=1 and https://gist.github.com/ruo91/7154697
Running a word count example on single-node hadoop installation
Use following steps:
cd wget http://www.gutenberg.org/cache/epub/132/pg132.txt #Assuming all services are running, use jps to verify jps hadoop dfs -mkdir /user/hadoop/input hadoop dfs -copyFromLocal pg132.txt /user/hadoop/input/pg132.txt cd /opt/hadoop/hadoop hadoop jar \ ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar \ wordcount /user/hadoop/input/pg132.txt /user/hadoop/output/wordcount hadoop dfs -cat /user/hadoop/output/wordcount/p* | less
To stop running hadoop daemons use:
stop-yarn.sh stop-dfs.sh
Steps learned from https://giraph.apache.org/quick_start.html
Home > Ubuntu > Hadoop cluster setup > Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86_64 Desktop