How to connect HBase using Apache Phoenix from Pentaho Kettle

In Our Office Mustafa needs to connect HBase from Pentaho Kettle. We find a solution for the problem. I want to share who need this.

  1. Download suitable  Apache Phoenix version for you from the website: http://phoenix.apache.org/download.html
  2. Copy two files from source directory to: PENTAHO_INSTALL_PATH/lib/ phoenix-core-4.3.1.jar phoenix-4.3.1-client.jar
  3. Create a new project in Pentaho: File -> New -> Transformation
  4. From left pane select **Design -> Input -> Table Input **and drag it to your transformation
  5. Double click to your table input and give a name to your step
  6. Click new next to Connection select box to create a new database connection
  7. Give your connection a name (Ex: Phoenix)
    Connection Type: Generic Database
    Access: Native (JDBC)
    Custom Connection URL: Your ZooKeeper Hosts (Ex: jdbc:phoenix:localhost:2181:/hbase)
    Custom Driver Class Name: org.apache.phoenix.jdbc.PhoenixDriver
    And then click Ok to close database connection settings popup

Thanks to Mustafa Artuc

image

10 Haziran 2015

Posted In: apache hbase, apache phoenix, hbase, pentaho, pentaho kettle, phoenix

How to connect HBase using Apache Phoenix from Pentaho Kettle

In Our Office Mustafa needs to connect HBase from Pentaho Kettle. We find a solution for the problem. I want to share who need this.

  1. Download suitable  Apache Phoenix version for you from the website: http://phoenix.apache.org/download.html
  2. Copy two files from source directory to: PENTAHO_INSTALL_PATH/lib/ phoenix-core-4.3.1.jar phoenix-4.3.1-client.jar
  3. Create a new project in Pentaho: File -> New -> Transformation
  4. From left pane select **Design -> Input -> Table Input **and drag it to your transformation
  5. Double click to your table input and give a name to your step
  6. Click new next to Connection select box to create a new database connection
  7. Give your connection a name (Ex: Phoenix)
    Connection Type: Generic Database
    Access: Native (JDBC)
    Custom Connection URL: Your ZooKeeper Hosts (Ex: jdbc:phoenix:localhost:2181:/hbase)
    Custom Driver Class Name: org.apache.phoenix.jdbc.PhoenixDriver
    And then click Ok to close database connection settings popup

Thanks to Mustafa Artuc

image

10 Haziran 2015

Posted In: apache hbase, apache phoenix, hbase, pentaho, pentaho kettle, phoenix

Bottom Line Tuning Tips for G1GC

When I read HBase User Maillist. I come cross Bryan Beaudreault’s experiences[1] about When they using G1 garage colllector with HBase in Hubspot. I want to note those. You can see below:

- If an allocation is larger than a 50% of the G1 region size, it is a humongous allocation which is more expensive to clean up.  You should want to avoid this.

- The default region size is only a few mb, so any big batch puts or scans can easily be considered humongous.  If you don’t set Xms, it will be even smaller.

- Make sure you are setting Xms to the same value as Xmx.  This is used by the G1 to calculate default region sizes.

- Enable -XX:+PrintAdaptiveSizePolicy, which will print out information you can use for debugging humongous allocations.  Any time an allocation is considered humongous, it will print the size of the allocation.

- Using the output of the above, determine your optimal region size. Region sizes must be a power of 2, and you should generally target around 2000 regions.  So a compromise is sometimes needed, as you don’t want to be *too* far below this number.

- Use -XX:G1HeapRegionSize=xM to set the region size.  Use a power of 2.

[1] http://apache-hbase.679495.n3.nabble.com/How-to-know-the-root-reason-to-cause-RegionServer-OOM-tp4071357p4071402.html

15 Mayıs 2015

Posted In: g1gc, hbase, java garbage collector

Bottom Line Tuning Tips for G1GC

When I read HBase User Maillist. I come cross Bryan Beaudreault’s experiences[1] about When they using G1 garage colllector with HBase in Hubspot. I want to note those. You can see below:

- If an allocation is larger than a 50% of the G1 region size, it is a humongous allocation which is more expensive to clean up.  You should want to avoid this.

- The default region size is only a few mb, so any big batch puts or scans can easily be considered humongous.  If you don’t set Xms, it will be even smaller.

- Make sure you are setting Xms to the same value as Xmx.  This is used by the G1 to calculate default region sizes.

- Enable -XX:+PrintAdaptiveSizePolicy, which will print out information you can use for debugging humongous allocations.  Any time an allocation is considered humongous, it will print the size of the allocation.

- Using the output of the above, determine your optimal region size. Region sizes must be a power of 2, and you should generally target around 2000 regions.  So a compromise is sometimes needed, as you don’t want to be *too* far below this number.

- Use -XX:G1HeapRegionSize=xM to set the region size.  Use a power of 2.

[1] http://apache-hbase.679495.n3.nabble.com/How-to-know-the-root-reason-to-cause-RegionServer-OOM-tp4071357p4071402.html

15 Mayıs 2015

Posted In: g1gc, hbase, java garbage collector

Hbase oldWALs directory: what is it, when is it cleaned and who use its ?

The oldWALs folder gets cleaned regularly by a chore in master. When a WAL file is not needed any more for recovery purposes (when HBase can guaratee HBase has flushed all the data in the WAL file), it is moved to the oldWALs
folder for archival. The log stays there until all other references to the
WAL file are finished. There is currently two services which may keep the
files in the archive dir. First is a TTL process, which ensures that the
WAL files are kept at least for 10 min. This is mainly for debugging. You
can reduce this time by setting hbase.master.logcleaner.ttl configuration
property in master. It is by default 600000. The other one is replication.
If you have replication setup, the replication processes will hang on to
the WAL files until they are replicated. Even if you disabled the
replication, the files are still referenced.

Source: http://mail-archives.apache.org/mod_mbox/hbase-user/201502.mbox/%3CCAMUu0w9aOVBo7kGULiM9tXrULirqs9fm-3ra3pQccYpW_17uOw@mail.gmail.com%3E

12 Mart 2015

Posted In: hbase, oldWALs, replication, WAL

Hbase oldWALs directory: what is it, when is it cleaned and who use its ?

The oldWALs folder gets cleaned regularly by a chore in master. When a WAL file is not needed any more for recovery purposes (when HBase can guaratee HBase has flushed all the data in the WAL file), it is moved to the oldWALs
folder for archival. The log stays there until all other references to the
WAL file are finished. There is currently two services which may keep the
files in the archive dir. First is a TTL process, which ensures that the
WAL files are kept at least for 10 min. This is mainly for debugging. You
can reduce this time by setting hbase.master.logcleaner.ttl configuration
property in master. It is by default 600000. The other one is replication.
If you have replication setup, the replication processes will hang on to
the WAL files until they are replicated. Even if you disabled the
replication, the files are still referenced.

Source: http://mail-archives.apache.org/mod_mbox/hbase-user/201502.mbox/%3CCAMUu0w9aOVBo7kGULiM9tXrULirqs9fm-3ra3pQccYpW_17uOw@mail.gmail.com%3E

12 Mart 2015

Posted In: hbase, oldWALs, replication, WAL

The replication is turned off HBase, why is oldWALs directory still holded ?

Disabled replication will still hold on to the WAL files because, because it has a guarantee to not lose data between disable and enable. You can remove_peer, which frees up the WAL files to be eligible for deletion. When you re-add replication peer again, the replication will start from the current status, versus if you re-enable a peer, it will continue from where it left.

12 Mart 2015

Posted In: apache hbase, hbase, replication, WAL

The replication is turned off HBase, why is oldWALs directory still holded ?

Disabled replication will still hold on to the WAL files because, because it has a guarantee to not lose data between disable and enable. You can remove_peer, which frees up the WAL files to be eligible for deletion. When you re-add replication peer again, the replication will start from the current status, versus if you re-enable a peer, it will continue from where it left.

12 Mart 2015

Posted In: apache hbase, hbase, replication, WAL

WP Twitter Auto Publish Powered By : XYZScripts.com