Month: February 2016

Always Backup Your Files in Multiple Places

Posted on Updated on

I use some cheap backup servers from backupsy.com, and last week one of my server suddenly being inaccessible.

After several days, the server came back online BUT with a new username and password and all my backup files on that particular server are gone :))

And today they sent me an email like this:

Please do not reply to this email. This mailbox is not maintained. Please use our ticket system instead.
———————-

Hello,

We are writing you because you have active service on our node18 server.

As you’re aware, we experienced a silent failure of a disk in this system, which corrupted some of the disk files for some VMs beyond repair.

We replaced the disk and have discontinued the use of WD SE drives in our servers. We then let the server run for two weeks and have not experienced further instability of the server. We are confident that the situation has been resolved but will continue to monitor as usual.

If your VM is not online, please get in touch with us to investigate further,

Due to the extended outage because of this situation, we have applied a credit to your account equivalent to two month’s worth of service. You can check and confirm this credit in the client area under your address information on the “main” page of the client area.

We apologize for the inconvenience you’ve experienced due to this problem. We appreciate your patronage and are committed to keeping your data safe.

If you have any questions or comments, please don’t hesitate to open a support ticket to discuss further. Please do not reply to this email as it is being sent from an outgoing-only account.

Thank you,
-Backupsy Team

Luckily, I put my backup in more than one places so I’m still good 🙂

HDFS Balancer and HBase Data Locality

Posted on Updated on

HBase blocks file locality and HDFS Balancer could post some problems 😦

There are 3 facts that I’ve learned from this stackoverflow post.

  1. The Hadoop (HDFS) balancer moves blocks around from one node to another to try to make it so each datanode has the same amount of data (within a configurable threshold). This messes up HBases’s data locality, meaning that a particular region may be serving a file that is no longer on it’s local host.
  2. HBase’s balance_switch balances the cluster so that each regionserver hosts the same number of regions (or close to). This is separate from Hadoop’s (HDFS) balancer.
  3. If you are running only HBase, I recommend not running Hadoop’s (HDFS) balancer as it will cause certain regions to lose their data locality. This causes any request to that region to have to go over the network to one of the datanodes that is serving it’s HFile.

HBase’s data locality is recovered though. Whenever compaction occurs, all the blocks are copied locally to the regionserver serving that region and merged. At that point, data locality is recovered for that region. With that, all you really need to do to add new nodes to the cluster is add them. Hbase will take care of rebalancing the regions, and once these regions compact data locality will be restored.