Wednesday, 4 March 2015

Spark on avro data

Found spark-avro integration.
https://github.com/databricks/spark-avro

It worked as expected. Thx

Tuesday, 3 March 2015

HBase offheap bucket cache


How to activate short-circuit local read on Hdfs & HBase


Editing....
Below are the minimum configs required :

Add below property to hdfs-site.xml

create dir /var/lib/hadoop-hdfs/ on each of the datanode.

<configuration>
  <property>
    <name>dfs.client.read.shortcircuit</name>
    <value>true</value>
  </property>
  <property>
    <name>dfs.domain.socket.path</name>
    <value>/var/lib/hadoop-hdfs/dn_socket</value>
  </property>
</configuration>

You would see dn_socket getting replaced with dn_1090 (if on secure cluster)


To enable on HBase:

just edit hbase-site.xml 

<configuration>
  <property>
    <name>dfs.client.read.shortcircuit</name>
    <value>true</value>
  </property>


Sunday, 1 March 2015

Incremental backup in HBase

I was looking for something like fool proof backup in HBase similar to what we have in RDBMS.
Currently HBase has full table backup concept using Snapshot. There is nothing called incremantal backup, found one good jira(https://issues.apache.org/jira/browse/HBASE-7912), committed by IBM. Hope soon this is patched to apache hbase.