Export Data From Cassandra to CSV

Posted on Updated on

Karena kebutuhan untuk mindahin data Cassandra dari cluster lama ke cluster baru maka gue bikin tools untuk ngebantu diri gue sendiri untuk export data cassandra ke CSV.

Updated: The code is pushed to a Github.

Kenapa gue harus bikin tools ini?

Karena Cassandra yg versi gratisan ngga punya tools “bisa jalan” untuk backup data di dalam cassandra.
Well ada sih tools nya beberapa, tapi kalau data lu besar pasti tools tersebut ngadat, makanya kita harus bikin sendiri dengan meraba-raba library connection untuk cassandra dari datastax.

Berhubung skill java gue masih cetek, jadi sorry kalau misalkan agak culun codingan nya. Tapi gue udah test bisa narik tables dengan isi data puluhan GB dan jutaan rows ga ada masalah, so kalau ada yg butuh silahkan di compile aja sendiri pake library java untuk cassandra dari datastax ya.

package lemonade.dumpCassandra;

import java.text.SimpleDateFormat;
import java.util.Iterator;

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.ColumnDefinitions.Definition;
import com.datastax.driver.core.DataType;
import com.datastax.driver.core.ResultSet;
import com.datastax.driver.core.Row;
import com.datastax.driver.core.Session;
import com.datastax.driver.core.SimpleStatement;
import com.datastax.driver.core.Statement;


/**
 * Dump Data from Cassandra to CSV
 * 2015/01/19
 * by sphinxid <firman.gautama@gmail.com>
 *
 */
public class CassExport
{
 public static void main( String[] args )
 {
 String keyspace = "YourKeyspace";
 String table = "TableName";
 String username = "username";
 String password = "password";
 String host = "127.0.0.1";


 Cluster.Builder clusterBuilder = Cluster.builder()
 .addContactPoints(host)
 .withCredentials(username, password);
 Cluster cluster = clusterBuilder.build();
 Session session = cluster.connect(keyspace);

 Statement stmt = new SimpleStatement("SELECT * FROM " + table);
 stmt.setFetchSize(1000);
 ResultSet rs = session.execute(stmt);
 Iterator<Row> iter = rs.iterator();

 while (!rs.isFullyFetched()) {
 rs.fetchMoreResults();
 Row row = iter.next();
 if (row != null)
 {
 StringBuilder line = new StringBuilder();
 for (Definition key : row.getColumnDefinitions().asList())
 {
 String val = myGetValue(key, row);
 line.append("\"");
 line.append(val);
 line.append("\"");
 line.append(",");
 }
 line.deleteCharAt(line.length()-1);
 System.out.println(line.toString());
 }
 }

 session.close();
 cluster.close();

 }

 public static String myGetValue(Definition key, Row row)
 {
 String str = "";

 if (key != null)
 {
 String col = key.getName();

 try
 {
 if (key.getType() == DataType.cdouble())
 {
 str = new Double(row.getDouble(col)).toString();
 }
 else if (key.getType() == DataType.cint())
 {
 str = new Integer(row.getInt(col)).toString();
 }
 else if (key.getType() == DataType.uuid())
 {
 str = row.getUUID(col).toString();
 }
 else if (key.getType() == DataType.cfloat())
 {
 str = new Float(row.getFloat(col)).toString();
 }
 else if (key.getType() == DataType.timestamp())
 {
 str = row.getDate(col).toString();

 SimpleDateFormat fmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ssZ");
 str = fmt.format(row.getDate(col));


 }
 else
 {
 str = row.getString(col);
 }
 } catch (Exception e)
 {
 str = "";
 }
 }

 return str;
 }

}

or this is the pastebin.

#Update

#Benchmark Speed
22gb of data ~ 122mil rows. 
Extracted in 444m38.061s. 
- 1 host cassandra server (4core, 8gb ram, sata hdd).
- avg 4.5k of rows / second.
Advertisements

3 thoughts on “Export Data From Cassandra to CSV

    77 ears said:
    March 24, 2015 at 2:39 pm

    Were you able to import CSV into C* table? thanks!

      Miklós Molnár said:
      April 16, 2015 at 3:58 pm

      Can you help me If I ask you?

      How should I compile and run this code with which version jars. I tried with their dependcies: cassandra-driver-core-1.0.8, cassandra-driver-core-2.0.0, cassandra-driver-core-2.1.4.
      I can run cqlsh with queries, there is no access-problem with my Cassandra(Linux).

      My last error message is:
      Exception in thread “main” java.lang.NullPointerException
      at com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:127)
      at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:92)
      at com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:291)
      at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:230)
      at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:170)
      at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:78)
      at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:977)
      at com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:1014)
      at com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:889)
      at com.datastax.driver.core.Cluster.connect(Cluster.java:170)
      at com.datastax.driver.core.Cluster.connect(Cluster.java:189)
      at CassExport.main(CassExport.java:35) /Session session = cluster.connect(keyspace);/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s