Latest Event Updates

How to Convert Compressed Text: bzip2 to gzip without Temporary Files Multithreaded

Posted on Updated on

The tools that you will need is: lbzip2, pigz and split (optional).

lbzip2 => http://lbzip2.org/
pigz => http://www.zlib.net/pigz/

If you are using Ubuntu (I’m using 14.04 LTS).
You can easily install lbzip2 and pigz by using apt-get (aptitude).

$ apt-get install lbzip2 pigz

Or you could download the source code from their website and compile manually.

Let say you have 500gb of text compressed bzip2 files called file01.txt.bz2 and you want to split that file to a multiple gziped files with 1500000 lines each, so it will be able to be processed faster in your hadoop cluster.

$ lbzcat file01.txt.bz2 | split -d -a 10 -l1500000 –filter=’pigz > newfile01-$FILE.gz’

 

How to Export/Import HBase Table

Posted on Updated on

– EXPORT –

eg, hbase table: hbase_test_table
and today date is 20160820

1. Create a temporary folder in hdfs for the exported files:
$ hadoop fs -mkdir /tmp/hbasedump/20160820

2. Execute this shell command in any hadoop node that has hbase gateway
$ hbase org.apache.hadoop.hbase.mapreduce.Export hbase_test_table /tmp/hbasedump/20160820/hbase_test_table

3. Please don’t forget to get the table structure, so you will be able to import the data back later on if needed.
$ hbase shell
hbase-shell> describe ‘hbase_test_table’

Table hbase_test_table is ENABLED
hbase_test_table
COLUMN FAMILIES DESCRIPTION
{NAME => ‘test_cf’, DATA_BLOCK_ENCODING => ‘NONE’, BLOOMFILTER => ‘NONE’, REPLICATION_SCOPE => ‘0’, COMPRESSION => ‘SNAPPY’, VERSIONS => ‘1’, MIN_VERSIONS => ‘0’, TTL => ‘FOREVER’
, KEEP_DELETED_CELLS => ‘FALSE’, BLOCKSIZE => ‘65536’, IN_MEMORY => ‘false’, BLOCKCACHE => ‘false’}
1 row(s) in 0.1290 seconds

– IMPORT –

eg, hbase table: test_import_hbase_test_table

1. Let say you have the dumped export file for that table in (hdfs) /tmp/hbasedump/20160820/hbase_test_table
And you want to import it to a new table “test_import_hbase_test_table”

2. $ hbase shell
– Create the table if it’s not yet created “test_import_hbase_test_table”
– Create the table with the same column family name (get the information on the export step #3 above).

3. Start the import process:
$ hbase org.apache.hadoop.hbase.mapreduce.Import “test_import_hbase_test_table” “/tmp/hbasedump/20160820/hbase_test_table”

How to Mount HBase Table as Hive External Table

Posted on Updated on

HBase table: “h_test_table”
Hive table: “test_table”

notes:
attribute:column1″
attribute << is the COLUMN FAMILY

Example:

CREATE EXTERNAL TABLE test_table (
raw_key STRING,
column1 STRING,
column2 STRING,
value STRING,
updated_at bigint)
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES (
‘hbase.columns.mapping’=’:key,attribute:column1,attribute:column1,attribute:value,attribute:updated_at’
)
TBLPROPERTIES(“hbase.table.name” = “h_test_table”);

Hack by the Beach in Jakarta

Posted on

On the past weekend, we participated in the BCA Finhacks 2016 Hackathon. It was a hackathon event at the Segarra in Jakarta Ancol beach.
There atmosphere was very very hot and humid at the day time, but they prepared for a free flow food and drinks for every participants.

Anyway.. we were creating an App called AngpaO. It’s basically an app similar to eventbrite, but 2 extra main feature which is “live streaming” of the event and a “donation” feature using BCA E-Wallet integration. If you would like to know more about the app we have our presentation slide deck uploaded here.

Our team were called “Opcode” and consisted by 3 people:
Ihsan Fauzi Rahman (Cermati.com)
Firman Gautama (ADSKOM)
Supardi (Lippo X)

What our team have done in 24 hours time frame.

1. Reverse Engineered the PHP BCA Finhacks SDK.
– Extract bca (composer phar) file into raw php.
– Implement code hook to cache ‘access_token’ from Oauth2 class from bca sdk.
— Why? Because on the briefing yesterday morning (the 1st day), we only have limit to 5 access_token request/min, and the bca sdk didn’t cache it. We were worry that during the development we could be throttled.

2. Created PHP http Wrapper for BCA API.
– Our main apps is written in NodeJS, at the present time, bca only provide sdk for php and java, so that’s why we created an internal http wrapper for our NodeJS apps.
– We implemented these following bca api on our php http wrapper:
— User Registration
— User Update
— Topup
— Payment
— History Transaction

3. Created NodeJS Frontend + API for our Web and Android App.

– We implemented these following features:
— login & register for web (+integrated with facebook account)
— user profile update
— implemented feature to create new AngpaO event.
— implemented feature to list events for other users/guest.
— implemented feature of “donation” history for event owner.
– for this, we didn’t just use bca api, because the bca api have limitation up to the last 10 transaction only. So we re-implement our own transaction history.

4. Created Android Apps (Native)

– We created a smartphone as AngpaO wallet in mind. (So every transaction should only happen via a smartphone)
– We implemented these following features:
— login & register feature from android that integrated with Facebook account
— Implemented Angpao feed events list
— Implemented Angpao event view with video live stream
— Implemented Top-up feature.
— Implemented Donate feature.
— Implemented QR code read for AngpaO event.
— Import user profile from facebook.

5. Created live streaming video+audio for our AngpaO event.
– backend: Using Wowza with RTMP & HLS protocol.
– frontend: Using jwplayer to play on the browser (we also embed this on our android apps)
– broadcaster : Using Open Broadcaster Studio and the webcam as video and audio source (for demo)

 

======
BONUS Thoughts😀
======

1. The first version of BCA PHP SDK was broken. (On 18th April 2016 they gave everyone the SDK to download and test)
– How it broke? After we “reverse engineered the sdk” we found out that there are no getter for OAuth2Client dan JsonParser in their sdk main class loader.
(This was fixed at 23 April 2016 by BCA) (but to be prepared, we also have fixed the sdk our self, but at the 1st day of hackathon they told us if there is a new version of the sdk).
2. BCA API untuk fungsi payment-nya sempat error?
– format tanggal dibilang salah, padahal udah pake seperti yg di contoh: https://finhacks.id/api/?php#payment. (Fixed?)
–> Seems (some) of bca api server/endpoint have different timezone (GMT+5 instead of GMT+7) (how do we know this? we looked at the transaction date from bca server response)
–> We use a work around:
– Untuk payment, make sure ‘request date’ yg dikirimkan ke server < dari current datetime di server api bca, or else bakal dianggap invalid request.
– Jadinya kita sengaja mengirimkan waktu ‘request date’ yg beberapa waktu lebih lambat untuk menghindari race condition yang menyebabkan gagal nya request ‘payment’.
3. Access point wifi nya overloaded. Karena mungkin banyak-nya device peserta yang connect ke sana. Jadinya koneksi internet nya kurang stabil.
4. The bca api sandbox suddenly disappear like 12 hours before the event start😦
5. Not every team treated equally. 
I don’t want to point out who is who, but you can ask many other people/team that participated in the event to confirm  :)

Always Backup Your Files in Multiple Places

Posted on Updated on

I use some cheap backup servers from backupsy.com, and last week one of my server suddenly being inaccessible.

After several days, the server came back online BUT with a new username and password and all my backup files on that particular server are gone :))

And today they sent me an email like this:

Please do not reply to this email. This mailbox is not maintained. Please use our ticket system instead.
———————-

Hello,

We are writing you because you have active service on our node18 server.

As you’re aware, we experienced a silent failure of a disk in this system, which corrupted some of the disk files for some VMs beyond repair.

We replaced the disk and have discontinued the use of WD SE drives in our servers. We then let the server run for two weeks and have not experienced further instability of the server. We are confident that the situation has been resolved but will continue to monitor as usual.

If your VM is not online, please get in touch with us to investigate further,

Due to the extended outage because of this situation, we have applied a credit to your account equivalent to two month’s worth of service. You can check and confirm this credit in the client area under your address information on the “main” page of the client area.

We apologize for the inconvenience you’ve experienced due to this problem. We appreciate your patronage and are committed to keeping your data safe.

If you have any questions or comments, please don’t hesitate to open a support ticket to discuss further. Please do not reply to this email as it is being sent from an outgoing-only account.

Thank you,
-Backupsy Team

Luckily, I put my backup in more than one places so I’m still good🙂