Tuesday, October 27, 2009

Simple Hadoop Hive Derby Installation and configuration

This article shows how to install Hadoop and get hive working with centralized metadata repository based on Derby database:


For this case we are going to have 7 datanodes and 1 namenode


hadoopNN01: 192.168.1.171
hadoopDN01: 192.168.1.172
hadoopDN02: 192.168.1.173
hadoopDN03: 192.168.1.174
hadoopDN04: 192.168.1.175
hadoopDN05: 192.168.1.176
hadoopDN06: 192.168.1.177
hadoopDN07: 192.168.1.178


-Hadoop 0.19.2 (There is a new version, but I will install this one for compatibility with other tools I am planning to install (i.e. hive, pig, etc)
-Install Java JDK on each server
-Create a hadoop account on each server
-Configure SSH 


Server 1
ssh-keygen -t dsa

Server 2
$mkdir .ssh (Under the home directory) 
Server 1
scp .ssh/id_dsa.hadoopnn01:/home/hadoop/.ssh/authorized_keys2
Server 2
$chmod 700 .ssh 
$chmod 600 .ssh/authorized_keys2


Do the same for all servers in order to guarantee can be sshed between them without password




Unpacked it under /user/local on each node



Set the following variables, on each node:

export HADOOP_HOME=/usr/local/hadoop-0.19.2
export JAVA_HOME=/usr/local/src/jdk1.6.0_12
export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_HOME/bin


Edit conf/hadoop-site.xml (This file must be in all nodes)








 
    fs.default.name
    hdfs://192.168.1.171
 
 
    dfs.data.dir
    /usr/local/hadoop-0.19.2/hdfs/data
    true
 
 

    dfs.name.dir
    /usr/local/hadoop-0.19.2/hdfs/name
    true
 
 
    hadoop.tmp.dir
    /usr/local/hadoop-0.19.2/tmp
    true
 
 
    mapred.system.dir
    /usr/local/hadoop-0.19.2/mapred/system
    true
 
 
    dfs.replication
    3
 
 
    mapred.job.tracker
    192.168.1.171:8021
 




Copy this files to all nodes



Create the directory /usr/local/hadoop-0.19.2/hdfs/data directory on each data node

Create the directory /usr/local/hadoop-0.19.2/hdfs/name directory in the name node


edit conf/slaves and add all the data nodes



192.168.1.172
192.168.1.173
192.168.1.174
192.168.1.175
192.168.1.176
192.168.1.177
192.168.1.178




On the Name node: start the cluster and mapreduce



$bin/start-dfs.sh

$bin/start-mapred.sh


Hive comes with this distribution,,,, but we need to install derby to have a centralized metadata repository




Untar the file in /usr/local/hadoop-0.19.2


Set the environment variables:



$export DERBY_INSTALL=/usr/local/hadoop-0.19.2/db-derby-10.5.1.1-bin
$export DERBY_HOME=/usr/local/hadoop-0.19.2/db-derby-10.5.1.1-bin
$export HADOOP=/usr/local/hadoop-0.19.2/bin/hadoop




$cd /usr/local/hadoop-0.19.2/db-derby-10.5.1.1-bin/data


Start derby


$nohup /usr/local/db-derby-10.5.1.1-bin/bin/startNetworkServer -h 0.0.0.0 &



CP derbyclient.jar and derbytools.jar to the lib's hive directory



$cp /usr/local/db-derby-10.5.1.3-bin/lib/derbyclient.jar /usr/local/hadoop-0.19.2/contrib/hive/lib

$cp /usr/local/db-derby-10.5.1.3-bin/lib/derbytools.jar /usr/local/hadoop-0.19.2/contrib/hive/lib

Edit the files /usr/local/hadoop-0.19.2/contrib/hive/conf/hive-default.xml and /usr/local/hadoop-0.19.2/contrib/hive/conf/jpox.properties

hive-default.xml:












  hadoop.bin.path
  ${user.dir}/../../../bin/hadoop
  
  Path to hadoop binary. Assumes that by default we are executing from hive



  hadoop.config.dir
  ${user.dir}/../../../conf
  
  Path to hadoop configuration. Again assumes that by default we are executing from hive/




  hive.exec.scratchdir
  /tmp/hive-${user.name}
  Scratch space for Hive jobs



  hive.metastore.local
  true
  controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM



  javax.jdo.option.ConnectionURL
  jdbc:derby://localhost:1527/metastore_db;create=true
  JDBC connect string for a JDBC metastore



  javax.jdo.option.ConnectionDriverName
  org.apache.derby.jdbc.ClientDriver
  Driver class name for a JDBC metastore



  hive.metastore.uris
  thift://
  Comma separated list of URIs of metastore servers. The first server that can be connected to will be used.



  hive.metastore.metadb.dir
  file:///var/metastore/metadb/
  The location of filestore metadata base dir
  hive.metastore.uris
  file:///var/metastore/metadb/
  
  hive.metastore.warehouse.dir
  /user/hive/warehouse
  location of default database for the warehouse
  hive.metastore.connect.retries
  5
  Number of retries while opening a connection to metastore
  hive.metastore.rawstore.impl
  org.apache.hadoop.hive.metastore.ObjectStore
  Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database
/usr/local/hadoop-0.19.2/contrib/hive/conf/jpox.properties:

javax.jdo.PersistenceManagerFactoryClass=org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema=false
org.jpox.validateTables=false
org.jpox.validateColumns=false
org.jpox.validateConstraints=false
org.jpox.storeManagerType=rdbms
org.jpox.autoCreateSchema=true
org.jpox.autoStartMechanismMode=checked
org.jpox.transactionIsolation=read_committed
javax.jdo.option.DetachAllOnCommit=true
javax.jdo.option.NontransactionalRead=true
javax.jdo.option.ConnectionDriverName=org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL=jdbc:derby://localhost:1527/metastore_db;create=true
javax.jdo.option.ConnectionUserName=APP
javax.jdo.option.ConnectionPassword=mine
org.jpox.cache.level2=true
org.jpox.cache.level2.type=SOFT
Run hive
/usr/local/hadoop-0.19.2/contrib/hive/bin/hive
Doing this all the metadata definitions are going to be stored in a centralized repository based on Derby database.









1 comment:

  1. This is such a great resource that you are providing and you give it away for free. I love seeing websites that understand the value of providing a quality resource for free.
    Hadoop Training in hyderabad

    ReplyDelete