Tuesday, October 27, 2009

Simple Hadoop Hive Derby Installation and configuration

This article shows how to install Hadoop and get hive working with centralized metadata repository based on Derby database:


For this case we are going to have 7 datanodes and 1 namenode


hadoopNN01: 192.168.1.171
hadoopDN01: 192.168.1.172
hadoopDN02: 192.168.1.173
hadoopDN03: 192.168.1.174
hadoopDN04: 192.168.1.175
hadoopDN05: 192.168.1.176
hadoopDN06: 192.168.1.177
hadoopDN07: 192.168.1.178


-Hadoop 0.19.2 (There is a new version, but I will install this one for compatibility with other tools I am planning to install (i.e. hive, pig, etc)
-Install Java JDK on each server
-Create a hadoop account on each server
-Configure SSH 


Server 1
ssh-keygen -t dsa

Server 2
$mkdir .ssh (Under the home directory) 
Server 1
scp .ssh/id_dsa.hadoopnn01:/home/hadoop/.ssh/authorized_keys2
Server 2
$chmod 700 .ssh 
$chmod 600 .ssh/authorized_keys2


Do the same for all servers in order to guarantee can be sshed between them without password




Unpacked it under /user/local on each node



Set the following variables, on each node:

export HADOOP_HOME=/usr/local/hadoop-0.19.2
export JAVA_HOME=/usr/local/src/jdk1.6.0_12
export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_HOME/bin


Edit conf/hadoop-site.xml (This file must be in all nodes)








 
    fs.default.name
    hdfs://192.168.1.171
 
 
    dfs.data.dir
    /usr/local/hadoop-0.19.2/hdfs/data
    true
 
 

    dfs.name.dir
    /usr/local/hadoop-0.19.2/hdfs/name
    true
 
 
    hadoop.tmp.dir
    /usr/local/hadoop-0.19.2/tmp
    true
 
 
    mapred.system.dir
    /usr/local/hadoop-0.19.2/mapred/system
    true
 
 
    dfs.replication
    3
 
 
    mapred.job.tracker
    192.168.1.171:8021
 




Copy this files to all nodes



Create the directory /usr/local/hadoop-0.19.2/hdfs/data directory on each data node

Create the directory /usr/local/hadoop-0.19.2/hdfs/name directory in the name node


edit conf/slaves and add all the data nodes



192.168.1.172
192.168.1.173
192.168.1.174
192.168.1.175
192.168.1.176
192.168.1.177
192.168.1.178




On the Name node: start the cluster and mapreduce



$bin/start-dfs.sh

$bin/start-mapred.sh


Hive comes with this distribution,,,, but we need to install derby to have a centralized metadata repository




Untar the file in /usr/local/hadoop-0.19.2


Set the environment variables:



$export DERBY_INSTALL=/usr/local/hadoop-0.19.2/db-derby-10.5.1.1-bin
$export DERBY_HOME=/usr/local/hadoop-0.19.2/db-derby-10.5.1.1-bin
$export HADOOP=/usr/local/hadoop-0.19.2/bin/hadoop




$cd /usr/local/hadoop-0.19.2/db-derby-10.5.1.1-bin/data


Start derby


$nohup /usr/local/db-derby-10.5.1.1-bin/bin/startNetworkServer -h 0.0.0.0 &



CP derbyclient.jar and derbytools.jar to the lib's hive directory



$cp /usr/local/db-derby-10.5.1.3-bin/lib/derbyclient.jar /usr/local/hadoop-0.19.2/contrib/hive/lib

$cp /usr/local/db-derby-10.5.1.3-bin/lib/derbytools.jar /usr/local/hadoop-0.19.2/contrib/hive/lib

Edit the files /usr/local/hadoop-0.19.2/contrib/hive/conf/hive-default.xml and /usr/local/hadoop-0.19.2/contrib/hive/conf/jpox.properties

hive-default.xml:












  hadoop.bin.path
  ${user.dir}/../../../bin/hadoop
  
  Path to hadoop binary. Assumes that by default we are executing from hive



  hadoop.config.dir
  ${user.dir}/../../../conf
  
  Path to hadoop configuration. Again assumes that by default we are executing from hive/




  hive.exec.scratchdir
  /tmp/hive-${user.name}
  Scratch space for Hive jobs



  hive.metastore.local
  true
  controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM



  javax.jdo.option.ConnectionURL
  jdbc:derby://localhost:1527/metastore_db;create=true
  JDBC connect string for a JDBC metastore



  javax.jdo.option.ConnectionDriverName
  org.apache.derby.jdbc.ClientDriver
  Driver class name for a JDBC metastore



  hive.metastore.uris
  thift://
  Comma separated list of URIs of metastore servers. The first server that can be connected to will be used.



  hive.metastore.metadb.dir
  file:///var/metastore/metadb/
  The location of filestore metadata base dir
  hive.metastore.uris
  file:///var/metastore/metadb/
  
  hive.metastore.warehouse.dir
  /user/hive/warehouse
  location of default database for the warehouse
  hive.metastore.connect.retries
  5
  Number of retries while opening a connection to metastore
  hive.metastore.rawstore.impl
  org.apache.hadoop.hive.metastore.ObjectStore
  Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database
/usr/local/hadoop-0.19.2/contrib/hive/conf/jpox.properties:

javax.jdo.PersistenceManagerFactoryClass=org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema=false
org.jpox.validateTables=false
org.jpox.validateColumns=false
org.jpox.validateConstraints=false
org.jpox.storeManagerType=rdbms
org.jpox.autoCreateSchema=true
org.jpox.autoStartMechanismMode=checked
org.jpox.transactionIsolation=read_committed
javax.jdo.option.DetachAllOnCommit=true
javax.jdo.option.NontransactionalRead=true
javax.jdo.option.ConnectionDriverName=org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL=jdbc:derby://localhost:1527/metastore_db;create=true
javax.jdo.option.ConnectionUserName=APP
javax.jdo.option.ConnectionPassword=mine
org.jpox.cache.level2=true
org.jpox.cache.level2.type=SOFT
Run hive
/usr/local/hadoop-0.19.2/contrib/hive/bin/hive
Doing this all the metadata definitions are going to be stored in a centralized repository based on Derby database.









Friday, October 9, 2009

How to connect to mysql using C/C++

Pre-requisites


-gcc must be installed
-mysql-dev must be installed


Procedure
-Create a user with access to the database/table you need to access


grant all privileges on test.* to test identified by 'mypassword';


-Create a table and populate it:


CREATE TABLE `mytable` (
  `id` int(11) NOT NULL DEFAULT '0',
  `name` varchar(30) DEFAULT NULL,
  PRIMARY KEY (`id`)
)



insert into mytable values (1,'Luis');
insert into mytable values (2,'Scott');
insert into mytable values (3,'Rob');


-Write the C program: 



#include <mysql.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>


main() {
   MYSQL *conn;
   MYSQL_RES *res;
   MYSQL_ROW row;


   char *server = "127.0.0.1";
   char *user = "test";
   char *password = "mypassword";
   char *database = "test";
   char query[] = "select * from mytable";
   conn = mysql_init(NULL);


   /* Connect to database */
   if (!mysql_real_connect(conn, server,
      user, password, database, 0, NULL, 0)) {
      fprintf(stderr, "%s\n", mysql_error(conn));
      exit(0);
    }


   /* Execute the query */
   if (mysql_query(conn, query)) {
       fprintf(stderr, "%s\n", mysql_error(conn));
       exit(0);
   }


   /* send SQL query */
   res = mysql_use_result(conn);


   /* output fields 1 and 2 of each row */
   while ((row = mysql_fetch_row(res)) != NULL)
      printf("%s %s \n", row[0],row[1]);


   /* Release memory used to store results and close connection */
   mysql_free_result(res);
   mysql_close(conn);
}


-Compile it
$ gcc -c `mysql_config --cflags` mysql_connection.c
$ gcc -o mysql_connection  mysql_connection.o `mysql_config --libs`


-Run it


$ ./mysql_connection
1 Luis
2 Scott
3 Rob

Thursday, October 8, 2009

Changing InnoDB Log File Size

If MySQL error log is showing this error:


InnoDB: ERROR: the age of the last checkpoint is 9433957,
InnoDB: which exceeds the log group capacity 9433498.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.


This means that the innodblog is smaller than some of the executed queries and it is unable to log them.


That is not critical if you are not using transactions, but it is if so.  Here is the procedure to change the InnoDB log file size.


1. Stop MySQL
/etc/init.d/mysql stop
2. Check that MySQL is not running
ps -fea | grep myql
3. Go to the data directory
cd data
4. Check the files
ls -la
5. Move the InnoDB log files to a safe location (If MySQL does not start we will need to copy then back)
mv ib_logfile0 /tmp/ib_logfile0.old
mv ib_logfile1 /tmp/ib_logfile1.old
4. Verify that the ib_logfiles are not in the /mysql/data directory
ls -la
5. Edit /etc/my.cnf
vi /etc/my.cnf
6. Add the line
innodb_log_file_size=50M (Set the File Size, in this case 50M)
7. Start MySQL
/etc/init.d/mysql start
Starting MySQL.....                                        [  OK  ]
8. Verify that MySQL is runing
ps -fea | grep mysql
9. Verify that the InnoDB log files were created and their sizes
ls -la /mysql/data
10. Verify that you can login to MySQL
mysql -u -p -A 
11 Delete the old InnoDB log files
rm /tmp/ib_logfile*