Hive II: Using it with MySQL as a metadata db

     In the previous post, we setup a Hive with metadata stored in Derby. 

     Sometimes this is not desirable, since Derbty DB it's not as popular as MySQL / Posgre or other more known dbms. 

     In this post, we will make hive store it's metadata in a MySQL database. Remember that the data itself is still being stored in the HDFS. What we're changing here is only the location for the metadata info.

     We wont deal with data migration, for the sake of simplicity, so please know that any metadata you have will be lost. We're assuming that you are doing this in the beginning of your system setup. Be aware of that!

    MySQL Install

    This is already covered in a previous topic, where we installed a MySQL to be accessed by the Spark engine. Please refer to it if you have any other questions, but in summary, what we'll do is:

    sudo apt install mysql-server
    
sudo mysql_secure_installation
    
sudo mysql -u root

    mysql> USE mysql;
    mysql> UPDATE user SET plugin='mysql_native_password' WHERE User='root';
    mysql> FLUSH PRIVILEGES;
    mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY 'MyNewPass';
    mysql> exit;

    check for connection:

    mysql -u root -p
    mysql> exit;

    Now, let's install the mysql connector and link it to our hive instance:

    Install the mysql connector JDBC, so Spark can access it. To do that, go to 
https://dev.mysql.com/downloads/connector/j/, select your OS (in our case, Ubuntu Linux), then download the version most suitable to your system.

Then, install it by using:

$ sudo gdebi ./mysql-connector-java_8.0.27-1ubuntu20.04_all.deb 

    ln -s /usr/share/java/mysql-connector-java-8.0.28.jar /opt/hive/lib/mysql-connector-java.jar
    mysql -u root -p

    mysql> CREATE USER hive@'%' identified by '(password)';
    mysql> GRANT all on *.* to hive;
    mysql> flush privileges;
    mysql> exit;


Hive Setup

    Finally, we need to change hive-site.xml to make it go for MySQL instead of Dearby:

<property>
 <name>javax.jdo.option.ConnectionURL</name>
 <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
 <description>metadata is stored in a MySQL server</description>
 </property>
 <property>
 <name>javax.jdo.option.ConnectionDriverName</name>
 <value>com.mysql.jdbc.Driver</value>
 <description>MySQL JDBC driver class</description>
 </property>
 <property>
 <name>javax.jdo.option.ConnectionUserName</name>
 <value>hiveuser</value>
 <description>user name for connecting to mysql server</description>
 </property>
 <property>
 <name>javax.jdo.option.ConnectionPassword</name>
 <value>hivepassword</value>
 <description>password for connecting to mysql server</description>
 </property>

    Now, all we have to do is to format and initialize the new metadata schema:

    /opt/hive/bin/schematool -dbType mysql -initSchema

    And that's it:

    hive> create table tab_users (id int, name string, age int, sex string) clustered by(id) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');

    hive> show tables;
    OK
    tab_users
    Time taken: 0.188 seconds, Fetched: 1 row(s)



Comentários

Postagens mais visitadas deste blog

Data Acquisition: Connection to Relational Databases