Difference between revisions of "Install giraph in hadoop node"
From Notes_Wiki
(Created page with "<yambe:breadcrumb>Java|Java</yambe:breadcrumb> =Install giraph in hadoop node= # Setup hadoop in single Cent-OS node as explained at [[Install hadoop in a single Cent-OS node...") |
m |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
[[Main_Page|Home]] > [[Java]] > [[Install giraph in hadoop node]] | |||
# Setup hadoop in single Cent-OS node as explained at [[Install hadoop in a single Cent-OS node]] | # Setup hadoop in single Cent-OS node as explained at [[Install hadoop in a single Cent-OS node]] | ||
Line 62: | Line 61: | ||
==Testing giraph by running a simple giraph job== | |||
< | We will run a simple shortest-path computation giraph job to verify giraph installation: | ||
# Create input file named <tt>tiny_graph.txt</tt> with following data in '<tt>/opt/hadoop/data</tt>' folder: | |||
#:<pre> | |||
#:: [0,0,[[1,1],[3,3]]] | |||
#:: [1,0,[[0,1],[2,2],[3,1]]] | |||
#:: [2,0,[[1,2],[4,4]]] | |||
#:: [3,0,[[0,3],[1,1],[4,4]]] | |||
#:: [4,0,[[3,4],[2,4]]] | |||
#:</pre> | |||
#::Each line above has the format '<tt>[source_id,source_value,[[dest_id, edge_value],...]]</tt>'. In this graph, there are 5 nodes and 12 directed edges. Copy the input file to HDFS: | |||
#:::<pre> | |||
#:::: ../hadoop/bin/hadoop dfs -copyFromLocal tiny_graph.txt /user/hduser/input/tiny_graph.txt | |||
#:::: ../hadoop/bin/hadoop dfs -ls /user/hduser/input | |||
#:::</pre> | |||
# Run the task using: | |||
#:<pre> | |||
#:: ../hadoop/bin/hadoop jar /opt/hadoop/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/hduser/input/tiny_graph.txt -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/hduser/output/shortestpaths -w 1 | |||
#:</pre> | |||
# Check the output using: | |||
#:<pre> | |||
#:: ../hadoop/bin/hadoop dfs -cat /user/hduser/output/shortestpaths/p* | less | |||
#:</pre> | |||
Steps learned from https://giraph.apache.org/quick_start.html | |||
[[Main_Page|Home]] > [[Java]] > [[Install giraph in hadoop node]] |
Latest revision as of 13:50, 7 April 2022
Home > Java > Install giraph in hadoop node
- Setup hadoop in single Cent-OS node as explained at Install hadoop in a single Cent-OS node
- Create a directory for temporary files such as /opt/hadoop/tmp and add following to 'conf/core-site.xml' file:
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/opt/hadoop/tmp</value>
- </property>
- Edit conf/mapred-site.xml file and add following configuration to allow 4 mappers to run in parallel:
- <property>
- <name>mapred.tasktracker.map.tasks.maximum</name>
- <value>4</value>
- </property>
- <property>
- <name>mapred.map.tasks</name>
- <value>4</value>
- </property>
- Edit conf/hdfs-site.xml and add:
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- <description></description>
- </property>
-
- to configure hdfs to maintain only one copy of data, effectively disabling replication.
- Format the node using ./bin/hadoop namenode -format, only if not formatted already.
- Start all services using ./bin/start-all.sh
- Install maven using
- sudo yum -y install maven
-
- Verify that installed version is >= 3.0 using mvn --version
- Download latest stable giraph from https://www.apache.org/dyn/closer.cgi/giraph/
- Extract giraph source in /opt/hadoop/giraph folder
- Make sure giraph files are owned by hadoop:hadoop
- Edit ~/.bash_profile for hadoop user and add:
- export GIRAPH_HOME=/opt/hadoop/giraph
- Exit from hadoop user and login again. Verify that variable is set using:
- set | grep GIRAPH
- Install maven using:
- cd $GIRAPH_HOME
- mvn package
-
- If you want to avoid running tests after install use:
- mvn package -DskipTests
- If you want to avoid running tests after install use:
- If installation is successful then folder 'giraph-core/target' should have file named 'giraph-<ver>-for-hadoop-<ver>-jar-with-dependencies.jar'. Also folder 'giraph-examples/target/' would have jar file for examples with similar naming.
Steps learned from https://giraph.apache.org/quick_start.html
Testing giraph by running a simple giraph job
We will run a simple shortest-path computation giraph job to verify giraph installation:
- Create input file named tiny_graph.txt with following data in '/opt/hadoop/data' folder:
- [0,0,[[1,1],[3,3]]]
- [1,0,[[0,1],[2,2],[3,1]]]
- [2,0,[[1,2],[4,4]]]
- [3,0,[[0,3],[1,1],[4,4]]]
- [4,0,[[3,4],[2,4]]]
-
- Each line above has the format '[source_id,source_value,[[dest_id, edge_value],...]]'. In this graph, there are 5 nodes and 12 directed edges. Copy the input file to HDFS:
- ../hadoop/bin/hadoop dfs -copyFromLocal tiny_graph.txt /user/hduser/input/tiny_graph.txt
- ../hadoop/bin/hadoop dfs -ls /user/hduser/input
- Each line above has the format '[source_id,source_value,[[dest_id, edge_value],...]]'. In this graph, there are 5 nodes and 12 directed edges. Copy the input file to HDFS:
- Run the task using:
- ../hadoop/bin/hadoop jar /opt/hadoop/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/hduser/input/tiny_graph.txt -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/hduser/output/shortestpaths -w 1
- Check the output using:
- ../hadoop/bin/hadoop dfs -cat /user/hduser/output/shortestpaths/p* | less
Steps learned from https://giraph.apache.org/quick_start.html