export

Export Data From Cassandra to CSV

Posted on Updated on

Karena kebutuhan untuk mindahin data Cassandra dari cluster lama ke cluster baru maka gue bikin tools untuk ngebantu diri gue sendiri untuk export data cassandra ke CSV.

Updated: The code is pushed to a Github.

Kenapa gue harus bikin tools ini?

Karena Cassandra yg versi gratisan ngga punya tools “bisa jalan” untuk backup data di dalam cassandra.
Well ada sih tools nya beberapa, tapi kalau data lu besar pasti tools tersebut ngadat, makanya kita harus bikin sendiri dengan meraba-raba library connection untuk cassandra dari datastax.

Berhubung skill java gue masih cetek, jadi sorry kalau misalkan agak culun codingan nya. Tapi gue udah test bisa narik tables dengan isi data puluhan GB dan jutaan rows ga ada masalah, so kalau ada yg butuh silahkan di compile aja sendiri pake library java untuk cassandra dari datastax ya.

package lemonade.dumpCassandra;

import java.text.SimpleDateFormat;
import java.util.Iterator;

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.ColumnDefinitions.Definition;
import com.datastax.driver.core.DataType;
import com.datastax.driver.core.ResultSet;
import com.datastax.driver.core.Row;
import com.datastax.driver.core.Session;
import com.datastax.driver.core.SimpleStatement;
import com.datastax.driver.core.Statement;


/**
 * Dump Data from Cassandra to CSV
 * 2015/01/19
 * by sphinxid <firman.gautama@gmail.com>
 *
 */
public class CassExport
{
 public static void main( String[] args )
 {
 String keyspace = "YourKeyspace";
 String table = "TableName";
 String username = "username";
 String password = "password";
 String host = "127.0.0.1";


 Cluster.Builder clusterBuilder = Cluster.builder()
 .addContactPoints(host)
 .withCredentials(username, password);
 Cluster cluster = clusterBuilder.build();
 Session session = cluster.connect(keyspace);

 Statement stmt = new SimpleStatement("SELECT * FROM " + table);
 stmt.setFetchSize(1000);
 ResultSet rs = session.execute(stmt);
 Iterator<Row> iter = rs.iterator();

 while (!rs.isFullyFetched()) {
 rs.fetchMoreResults();
 Row row = iter.next();
 if (row != null)
 {
 StringBuilder line = new StringBuilder();
 for (Definition key : row.getColumnDefinitions().asList())
 {
 String val = myGetValue(key, row);
 line.append("\"");
 line.append(val);
 line.append("\"");
 line.append(",");
 }
 line.deleteCharAt(line.length()-1);
 System.out.println(line.toString());
 }
 }

 session.close();
 cluster.close();

 }

 public static String myGetValue(Definition key, Row row)
 {
 String str = "";

 if (key != null)
 {
 String col = key.getName();

 try
 {
 if (key.getType() == DataType.cdouble())
 {
 str = new Double(row.getDouble(col)).toString();
 }
 else if (key.getType() == DataType.cint())
 {
 str = new Integer(row.getInt(col)).toString();
 }
 else if (key.getType() == DataType.uuid())
 {
 str = row.getUUID(col).toString();
 }
 else if (key.getType() == DataType.cfloat())
 {
 str = new Float(row.getFloat(col)).toString();
 }
 else if (key.getType() == DataType.timestamp())
 {
 str = row.getDate(col).toString();

 SimpleDateFormat fmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ssZ");
 str = fmt.format(row.getDate(col));


 }
 else
 {
 str = row.getString(col);
 }
 } catch (Exception e)
 {
 str = "";
 }
 }

 return str;
 }

}

or this is the pastebin.

#Update

#Benchmark Speed
22gb of data ~ 122mil rows. 
Extracted in 444m38.061s. 
- 1 host cassandra server (4core, 8gb ram, sata hdd).
- avg 4.5k of rows / second.
Advertisements