Luis Roncancio Data Tech Docs: 2009

Tuesday, October 27, 2009

Simple Hadoop Hive Derby Installation and configuration

This article shows how to install Hadoop and get hive working with centralized metadata repository based on Derby database:

For this case we are going to have 7 datanodes and 1 namenode

hadoopNN01: 192.168.1.171
hadoopDN01: 192.168.1.172
hadoopDN02: 192.168.1.173
hadoopDN03: 192.168.1.174
hadoopDN04: 192.168.1.175
hadoopDN05: 192.168.1.176
hadoopDN06: 192.168.1.177
hadoopDN07: 192.168.1.178

-Hadoop 0.19.2 (There is a new version, but I will install this one for compatibility with other tools I am planning to install (i.e. hive, pig, etc)
-Install Java JDK on each server
-Create a hadoop account on each server
-Configure SSH

Server 1
$ ssh-keygen -t dsa

Server 2

$mkdir .ssh (Under the home directory)

Server 1

scp .ssh/id_dsa.hadoopnn01:/home/hadoop/.ssh/authorized_keys2

Server 2

$chmod 700 .ssh

$chmod 600 .ssh/authorized_keys2

Do the same for all servers in order to guarantee can be sshed between them without password

Download hadoop

Unpacked it under /user/local on each node

Set the following variables, on each node:

export HADOOP_HOME=/usr/local/hadoop-0.19.2

export JAVA_HOME=/usr/local/src/jdk1.6.0_12

export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_HOME/bin

Edit conf/hadoop-site.xml (This file must be in all nodes)

fs.default.name

hdfs://192.168.1.171

dfs.data.dir

/usr/local/hadoop-0.19.2/hdfs/data

true

dfs.name.dir

/usr/local/hadoop-0.19.2/hdfs/name

true

hadoop.tmp.dir

/usr/local/hadoop-0.19.2/tmp

true

mapred.system.dir

/usr/local/hadoop-0.19.2/mapred/system

true

dfs.replication

mapred.job.tracker

192.168.1.171:8021

Copy this files to all nodes

Create the directory /usr/local/hadoop-0.19.2/hdfs/data directory on each data node

Create the directory /usr/local/hadoop-0.19.2/hdfs/name directory in the name node

edit conf/slaves and add all the data nodes

192.168.1.172

192.168.1.173

192.168.1.174

192.168.1.175

192.168.1.176

192.168.1.177

192.168.1.178

On the Name node: start the cluster and mapreduce

$bin/start-dfs.sh

$bin/start-mapred.sh

Hive comes with this distribution,,,, but we need to install derby to have a centralized metadata repository

Download derby

Untar the file in /usr/local/hadoop-0.19.2

Set the environment variables:

$export DERBY_INSTALL=/usr/local/hadoop-0.19.2/db-derby-10.5.1.1-bin

$export DERBY_HOME=/usr/local/hadoop-0.19.2/db-derby-10.5.1.1-bin

$export HADOOP=/usr/local/hadoop-0.19.2/bin/hadoop

$cd /usr/local/hadoop-0.19.2/db-derby-10.5.1.1-bin/data

Start derby

$nohup /usr/local/db-derby-10.5.1.1-bin/bin/startNetworkServer -h 0.0.0.0 &

CP derbyclient.jar and derbytools.jar to the lib's hive directory

$cp /usr/local/db-derby-10.5.1.3-bin/lib/derbyclient.jar /usr/local/hadoop-0.19.2/contrib/hive/lib

$cp /usr/local/db-derby-10.5.1.3-bin/lib/derbytools.jar /usr/local/hadoop-0.19.2/contrib/hive/lib

Edit the files /usr/local/hadoop-0.19.2/contrib/hive/conf/hive-default.xml and /usr/local/hadoop-0.19.2/contrib/hive/conf/jpox.properties

hive-default.xml:












  hadoop.bin.path
  ${user.dir}/../../../bin/hadoop
  
  Path to hadoop binary. Assumes that by default we are executing from hive



  hadoop.config.dir
  ${user.dir}/../../../conf
  
  Path to hadoop configuration. Again assumes that by default we are executing from hive/




  hive.exec.scratchdir
  /tmp/hive-${user.name}
  Scratch space for Hive jobs



  hive.metastore.local
  true
  controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM



  javax.jdo.option.ConnectionURL
  jdbc:derby://localhost:1527/metastore_db;create=true
  JDBC connect string for a JDBC metastore



  javax.jdo.option.ConnectionDriverName
  org.apache.derby.jdbc.ClientDriver
  Driver class name for a JDBC metastore



  hive.metastore.uris
  thift://
  Comma separated list of URIs of metastore servers. The first server that can be connected to will be used.



  hive.metastore.metadb.dir
  file:///var/metastore/metadb/
  The location of filestore metadata base dir
  hive.metastore.uris
  file:///var/metastore/metadb/
  

  hive.metastore.warehouse.dir
  /user/hive/warehouse
  location of default database for the warehouse

  hive.metastore.connect.retries
  5
  Number of retries while opening a connection to metastore

  hive.metastore.rawstore.impl
  org.apache.hadoop.hive.metastore.ObjectStore
  Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database


/usr/local/hadoop-0.19.2/contrib/hive/conf/jpox.properties:

javax.jdo.PersistenceManagerFactoryClass=org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema=false
org.jpox.validateTables=false
org.jpox.validateColumns=false
org.jpox.validateConstraints=false
org.jpox.storeManagerType=rdbms
org.jpox.autoCreateSchema=true
org.jpox.autoStartMechanismMode=checked
org.jpox.transactionIsolation=read_committed
javax.jdo.option.DetachAllOnCommit=true
javax.jdo.option.NontransactionalRead=true
javax.jdo.option.ConnectionDriverName=org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL=jdbc:derby://localhost:1527/metastore_db;create=true
javax.jdo.option.ConnectionUserName=APP
javax.jdo.option.ConnectionPassword=mine
org.jpox.cache.level2=true
org.jpox.cache.level2.type=SOFT

Run hive

/usr/local/hadoop-0.19.2/contrib/hive/bin/hive

Doing this all the metadata definitions are going to be stored in a centralized repository based on Derby database.

Friday, October 9, 2009

How to connect to mysql using C/C++

Pre-requisites

-gcc must be installed
-mysql-dev must be installed

Procedure
-Create a user with access to the database/table you need to access

grant all privileges on test.* to test identified by 'mypassword';

-Create a table and populate it:

CREATE TABLE `mytable` (
  `id` int(11) NOT NULL DEFAULT '0',
  `name` varchar(30) DEFAULT NULL,
  PRIMARY KEY (`id`)
)

insert into mytable values (1,'Luis');
insert into mytable values (2,'Scott');
insert into mytable values (3,'Rob');

-Write the C program:

#include <mysql.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>

main() {
   MYSQL *conn;
   MYSQL_RES *res;
   MYSQL_ROW row;

   char *server = "127.0.0.1";
   char *user = "test";
   char *password = "mypassword";
   char *database = "test";
   char query[] = "select * from mytable";
   conn = mysql_init(NULL);

   /* Connect to database */
   if (!mysql_real_connect(conn, server,
   user, password, database, 0, NULL, 0)) {
   fprintf(stderr, "%s\n", mysql_error(conn));
   exit(0);
   }

   /* Execute the query */
   if (mysql_query(conn, query)) {
   fprintf(stderr, "%s\n", mysql_error(conn));
   exit(0);
   }

   /* send SQL query */
   res = mysql_use_result(conn);

   /* output fields 1 and 2 of each row */
   while ((row = mysql_fetch_row(res)) != NULL)
   printf("%s %s \n", row[0],row[1]);

   /* Release memory used to store results and close connection */
   mysql_free_result(res);
   mysql_close(conn);
}

-Compile it

$ gcc -c `mysql_config --cflags` mysql_connection.c

$ gcc -o mysql_connection mysql_connection.o `mysql_config --libs`

-Run it

$ ./mysql_connection

1 Luis

2 Scott

3 Rob

Thursday, October 8, 2009

Changing InnoDB Log File Size

If MySQL error log is showing this error:

InnoDB: ERROR: the age of the last checkpoint is 9433957,

InnoDB: which exceeds the log group capacity 9433498.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.

This means that the innodblog is smaller than some of the executed queries and it is unable to log them.

That is not critical if you are not using transactions, but it is if so. Here is the procedure to change the InnoDB log file size.

1. Stop MySQL

/etc/init.d/mysql stop

2. Check that MySQL is not running

ps -fea | grep myql

3. Go to the data directory

cd data

4. Check the files

ls -la

5. Move the InnoDB log files to a safe location (If MySQL does not start we will need to copy then back)

mv ib_logfile0 /tmp/ib_logfile0.old

mv ib_logfile1 /tmp/ib_logfile1.old

4. Verify that the ib_logfiles are not in the /mysql/data directory

ls -la

5. Edit /etc/my.cnf

vi /etc/my.cnf

6. Add the line

innodb_log_file_size=50M (Set the File Size, in this case 50M)

7. Start MySQL

/etc/init.d/mysql start

Starting MySQL..... [ OK ]

8. Verify that MySQL is runing

ps -fea | grep mysql

9. Verify that the InnoDB log files were created and their sizes

ls -la /mysql/data

10. Verify that you can login to MySQL

mysql -u -p -A

11 Delete the old InnoDB log files

rm /tmp/ib_logfile*

Luis Roncancio Data Tech Docs

Tuesday, October 27, 2009

Simple Hadoop Hive Derby Installation and configuration

Friday, October 9, 2009

How to connect to mysql using C/C++

Thursday, October 8, 2009

Changing InnoDB Log File Size

Popular Posts

About Me

Search This Blog

Blog Archive

Followers

Luis Roncancio Data Tech Docs

Tuesday, October 27, 2009

Simple Hadoop Hive Derby Installation and configuration

Friday, October 9, 2009

How to connect to mysql using C/C++

Thursday, October 8, 2009

Changing InnoDB Log File Size

Popular Posts

About Me

Search This Blog

Blog Archive

Followers

Subscribe To