For this case we are going to have 7 datanodes and 1 namenode
hadoopNN01: 192.168.1.171
hadoopDN01: 192.168.1.172
hadoopDN02: 192.168.1.173
hadoopDN03: 192.168.1.174
hadoopDN04: 192.168.1.175
hadoopDN05: 192.168.1.176
hadoopDN06: 192.168.1.177
hadoopDN07: 192.168.1.178
-Hadoop 0.19.2 (There is a new version, but I will install this one for compatibility with other tools I am planning to install (i.e. hive, pig, etc)
-Install Java JDK on each server
-Create a hadoop account on each server
-Configure SSH
Server 1
$ ssh-keygen -t dsa
Server 2
$mkdir .ssh (Under the home directory)
Server 1
scp .ssh/id_dsa.hadoopnn01:/home/hadoop/.ssh/authorized_keys2
Server 2
$chmod 700 .ssh
$chmod 600 .ssh/authorized_keys2
Do the same for all servers in order to guarantee can be sshed between them without password
Unpacked it under /user/local on each node
Set the following variables, on each node:
export HADOOP_HOME=/usr/local/hadoop-0.19.2
export JAVA_HOME=/usr/local/src/jdk1.6.0_12
export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_HOME/bin
Edit conf/hadoop-site.xml (This file must be in all nodes)
Copy this files to all nodes
Create the directory /usr/local/hadoop-0.19.2/hdfs/data directory on each data node
Create the directory /usr/local/hadoop-0.19.2/hdfs/name directory in the name node
edit conf/slaves and add all the data nodes
192.168.1.172
192.168.1.173
192.168.1.174
192.168.1.175
192.168.1.176
192.168.1.177
192.168.1.178
On the Name node: start the cluster and mapreduce
$bin/start-dfs.sh
$bin/start-mapred.sh
Hive comes with this distribution,,,, but we need to install derby to have a centralized metadata repository
Untar the file in /usr/local/hadoop-0.19.2
Set the environment variables:
$export DERBY_INSTALL=/usr/local/hadoop-0.19.2/db-derby-10.5.1.1-bin
$export DERBY_HOME=/usr/local/hadoop-0.19.2/db-derby-10.5.1.1-bin
$export HADOOP=/usr/local/hadoop-0.19.2/bin/hadoop
$cd /usr/local/hadoop-0.19.2/db-derby-10.5.1.1-bin/data
Start derby
$nohup /usr/local/db-derby-10.5.1.1-bin/bin/startNetworkServer -h 0.0.0.0 &
CP derbyclient.jar and derbytools.jar to the lib's hive directory
$cp /usr/local/db-derby-10.5.1.3-bin/lib/derbyclient.jar /usr/local/hadoop-0.19.2/contrib/hive/lib
$cp /usr/local/db-derby-10.5.1.3-bin/lib/derbytools.jar /usr/local/hadoop-0.19.2/contrib/hive/lib
Edit the files /usr/local/hadoop-0.19.2/contrib/hive/conf/hive-default.xml and /usr/local/hadoop-0.19.2/contrib/hive/conf/jpox.properties
hive-default.xml:
hadoop.bin.path
${user.dir}/../../../bin/hadoop
Path to hadoop binary. Assumes that by default we are executing from hive
hadoop.config.dir
${user.dir}/../../../conf
Path to hadoop configuration. Again assumes that by default we are executing from hive/
hive.exec.scratchdir
/tmp/hive-${user.name}
Scratch space for Hive jobs
hive.metastore.local
true
controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM
javax.jdo.option.ConnectionURL
jdbc:derby://localhost:1527/metastore_db;create=true
JDBC connect string for a JDBC metastore
javax.jdo.option.ConnectionDriverName
org.apache.derby.jdbc.ClientDriver
Driver class name for a JDBC metastore
hive.metastore.uris
thift://
Comma separated list of URIs of metastore servers. The first server that can be connected to will be used.
hive.metastore.metadb.dir
file:///var/metastore/metadb/
The location of filestore metadata base dir
hive.metastore.uris file:///var/metastore/metadb/
hive.metastore.warehouse.dir /user/hive/warehouse location of default database for the warehouse
hive.metastore.connect.retries 5 Number of retries while opening a connection to metastore
hive.metastore.rawstore.impl org.apache.hadoop.hive.metastore.ObjectStore Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database
/usr/local/hadoop-0.19.2/contrib/hive/conf/jpox.properties:
javax.jdo.PersistenceManagerFactoryClass=org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema=false
org.jpox.validateTables=false
org.jpox.validateColumns=false
org.jpox.validateConstraints=false
org.jpox.storeManagerType=rdbms
org.jpox.autoCreateSchema=true
org.jpox.autoStartMechanismMode=checked
org.jpox.transactionIsolation=read_committed
javax.jdo.option.DetachAllOnCommit=true
javax.jdo.option.NontransactionalRead=true
javax.jdo.option.ConnectionDriverName=org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL=jdbc:derby://localhost:1527/metastore_db;create=true
javax.jdo.option.ConnectionUserName=APP
javax.jdo.option.ConnectionPassword=mine
org.jpox.cache.level2=true
org.jpox.cache.level2.type=SOFT
Run hive
/usr/local/hadoop-0.19.2/contrib/hive/bin/hive
Doing this all the metadata definitions are going to be stored in a centralized repository based on Derby database.