Install hadoop in a single Cent-OS node

From Notes_Wiki
Revision as of 10:01, 27 January 2014 by Saurabh (talk | contribs)

<yambe:breadcrumb>Java|Java</yambe:breadcrumb>

Install hadoop in a single Cent-OS node

  1. Install Oracle Java. Steps can be learned from Install oracle java in Cent-OS
  2. Create user account and password for hadoop using:
    sudo /sbin/useradd hadoop
    sudo /usr/bin/passwd hadoop
  3. Configure key based login from hadoop to hadoop itself using:
    sudo su - hadoop
    ssh-keygen
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    chmod 0600 ~/.ssh/authorized_keys
    #To test configuration, should echo hadoop
    ssh hadoop@localhost "echo $USER"
    exit
  4. Download hadoop source from one of the mirrors linked at https://www.apache.org/dyn/closer.cgi/hadoop/common/ Download the latest stable .tar.gz release from stable folder. (Ex hadoop-1.2.1.tar.gz)
  5. Extract hadoop sources in /opt/hadoop and make hadoop:hadoop its owner using:
    sudo mkdir /opt/hadoop
    cd /opt/hadoop/
    sudo tar xzf <path-to-hadoop-source>
    sudo mv hadoop-1.2.1 hadoop
    sudo chown -R hadoop:hadoop .
  6. Configure hadoop for single node setup using:
    1. Login as user hadoop and change pwd to /opt/hadoop/hadoop using:
      sudo su - hadoop
      cd /opt/hadoop/hadoop
    2. Edit conf/core-site.xml and insert following within configuration tag:
      <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:9000/</value>
      </property>
      <property>
      <name>dfs.permissions</name>
      <value>false</value>
      </property>
    3. Edit conf/hdfs-site.xml and insert following within configuration tag:
      <property>
      <name>dfs.data.dir</name>
      <value>/opt/hadoop/hadoop/dfs/name/data</value>
      <final>true</final>
      </property>
      <property>
      <name>dfs.name.dir</name>
      <value>/opt/hadoop/hadoop/dfs/name</value>
      <final>true</final>
      </property>
      <property>
      <name>dfs.replication</name>
      <value>1</value>
      </property>
    4. Edit conf/mapred-site.xml and following within configuration tag:
      <property>
      <name>mapred.job.tracker</name>
      <value>localhost:9001</value>
      </property>
    5. Edit conf/hadoop-env.sh and do following changes:
      • Uncomment JAVA_HOME and set it to export JAVA_HOME=/opt/jdk1.7.0_40 or appropriate value based on installed java
      • Uncomment HADOOP_OPTS and set it to
        export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
  7. Format namenode using:
    ./bin/hadoop namenode -format
  8. Start all services using:
    ./bin/start-all.sh
  9. Verify that all services got started using 'jps' command whose ouput should be similar to:
    26049 SecondaryNameNode
    25929 DataNode
    26399 Jps
    26129 JobTracker
    26249 TaskTracker
    25807 NameNode
    with different process-ids.
  10. Try to access different services at:
  11. To stop all services use:
    ./bin/stop-all.sh


Steps learned from http://tecadmin.net/steps-to-install-hadoop-on-centosrhel-6/


Verify hadoop installation by running a map-reduce job

We will try to run hadoop wordcount example on large text file to verify hadoop is functioning properly:

  1. Download large text file using:
    wget http://www.gutenberg.org/cache/epub/132/pg132.txt
    in /opt/hadoop/data
  2. Verify that all hadoop services are running using jps. If they are not running use ../hadoop/bin/start-all.sh to start everything.
  3. Copy file from local filesystem to hdfs using:
    ../hadoop/bin/hadoop dfs -copyFromLocal pg132.txt /user/hduser/input/pg132.txt
  4. Verify file got copied using:
    ../hadoop/bin/hadoop dfs -ls /user/hduser/input
    Note you can find more dfs commands using:
    ../hadoop/bin/hadoop dfs help
  5. Open various hadoop web UIs in different browser tabs:
    http://localhost:50030/
    Shows number of map / reduce processes
    http://localhost:50060/
    Shows number of hadoop tasks being executed
    http://localhost:50070/
    Shows number of live nodes and also provides a file browser to browse hdfs
  6. Start hadoop job using:
    ../hadoop/bin/hadoop jar ../hadoop/hadoop-examples-1.2.1.jar wordcount /user/hduser/input/pg132.txt /user/hduser/output/wordcount
    The examples jar version might be different from 1.2.1 based on installed version of hadoop.
  7. Check the output using:
    ../hadoop/bin/hadoop dfs -cat /user/hduser/output/wordcount/p* | less

Steps learned from https://giraph.apache.org/quick_start.html



<yambe:breadcrumb>Java|Java</yambe:breadcrumb>