kodingan

How to Render a Html Page with Selenium Webdriver + PhantomJS in Python

Posted on Updated on

For a better view, check out the Github link.

#
# 20160929 - by sphinxid - firman.gautama@gmail.com
#
# Example of multithreaded selenium webdriver with phantomjs in Python.
# In this example, it will use 10 thread + 10 phantomjs to do 25000 request to "url".
# 

from selenium import webdriver
import time
import concurrent.futures
import signal
from concurrent.futures import ThreadPoolExecutor
from random import randint

def fetch(url, driver):
 try:
 if not driver.current_url:
 driver.refresh()
 else:
 driver.get(url)

 driver.implicitly_wait(2)
 driver.set_page_load_timeout(2)
 print 1
 except:
 print 2
 pass

 return 0

def clean_up(driver):
 try:
 driver.service.process.send_signal(signal.SIGTERM)
 driver.quit()
 except:
 pass

 return

if __name__ == "__main__":
 num_thread = 10
 num_request = 25000
 url = "http://www.yahoo.com/"

 # instantiate threadpool
 pool = ThreadPoolExecutor(num_thread)
 parr = []

 # instantiate PhantomJS per THread
 for x in range(0, num_thread):
 print "Initialized thread %s " % x
 parr.append(webdriver.PhantomJS())
 print " OK."

 start_time = time.time()

 # Use one random thread from thread pool to access the URL
 for x in range(0, num_request-1):
 n = randint(0, (num_thread-1))
 future = pool.submit(fetch, url, parr[n])
 future.done()

 # clean_up: make sure phantomjs process is closed
 for x in range(0, num_thread):
 future = pool.submit(clean_up, parr[x])
 future.done()

Add Youtube IP Addresses to Mikrotik Address List via DNS Sync (Both A Records and CNAME)

Posted on

This script can sync dns cache by regex keyword that we specify to the mikrotik address list, so you can use it to filter or manipulate the traffic.

#
# yutup v0.1
#
# Based on http://wiki.mikrotik.com/wiki/Sync_Address_List_with_DNS_Cache
# A Mikrotik script for syncing address list with DNS cache lookup by certain keyword.
# This script will try to resolve a CNAME to A records by doing dns recursive.
#
# 2015/03/07
# by Firman Gautama <firman@kodelatte.com>
#
# Tested on ROS v6.22 (RB750GL)
#

# REGEX match
:local search "youtube|googlevideo.com"
:local myaddressname "FRMN_YOUTUBE";

# Define Arrays
:local IPs ""
:local CNAMEs ""
:local Names ""

for x from=0 to 2 step=1 do={

# Reset Arrays
:set IPs ""
:set CNAMEs ""
:set Names ""

/ip dns cache all {
   :local name; :local type; :local data

   :foreach rule in=[print detail as-value where (static=no)] do={
      :set name ""; :set type ""; :set data ""

      :local num 0
      :foreach item in=$rule do={
         :if ($num = 2) do={ :set name $item }
         :if ($num = 4) do={ :set type $item }
         :if ($num = 1) do={ :set data $item }
         :set num ($num + 1)
      }

# identify CNAME and try to Resolve it
     :if ([:tostr $name] ~ [:tostr $search] && $type = "CNAME") do={
#         :put ("Found CNAME " . $name . " -> " . $data)
         :set CNAMEs ($CNAMEs . $data . ",")
         :set Names ($Names . $name . ",")

         :resolve $data;
      }

# get DNS A Record
     :if ([:tostr $name] ~ [:tostr $search] && $type = "A") do={
#         :put ("Found A Record " . $name . " -> " . $data)
         :set IPs ($IPs . $data . ",")
         :set Names ($Names . $name . ",")
      }

   }
# /ip dns cache all
}
}


##### 

##### clean up unique A Records
:local uniqueIPs ""
:set uniqueIPs ""


:foreach val in=[:toarray $IPs] do={
  :local unik 1;

 :if ($uniqueIPs = "") do={
    :set uniqueIPs ($uniqueIPs . $val . ",");
  }

  if ($uniqueIPs != "") do={
    :set unik 1;

    :foreach val2 in=[:toarray $uniqueIPs] do={
      :if ($val = $val2) do={
        :set unik 0;
      }
    }

    :if ($unik = 1) do={
      :set uniqueIPs ($uniqueIPs . $val . ",");
    }
  }
}

########################


:put ("DNS cache search found " . [:len [:toarray $IPs]] . " A match(es) for '" . $search . "'")
:put ("DNS cache search found " . [:len [:toarray $uniqueIPs]] . " Unique A match(es) for '" . $search . "'")
:put ("DNS cache search found " . [:len [:toarray $CNAMEs]] . " CNAME match(es) for '" . $search . "'")


# Search through IPs and add to address list
/ip firewall address-list {
   :local findex; :local listaddr; :local IPsFound ""

   :put ("Searching address list '" . $myaddressname . "'...")
   :foreach l in=[find list=($myaddressname)] do={
      :set listaddr [get $l address]
      :if ([:len [:find [:toarray $uniqueIPs] [:toip $listaddr]]] = 0) do={
         :put ("   " . $listaddr . " not found in search, removing...")
         remove $l
      } else={
         :put ($listaddr . " found address in IPs")
         :set IPsFound ($IPsFound . $listaddr . ",")
      }
   }

# Add remaining records to address list
   :set findex 0
   :foreach ip in=[:toarray $uniqueIPs] do={
      :if ([:len [:find [:toarray $IPsFound] [:toip $ip]]] = 0) do={
         :put ("   Adding address " . $ip)
         add list=($myaddressname) address=[:toip $ip] comment=([:pick [:toarray $Names] $findex]) disabled=no
      }
      :set findex ($findex + 1)
   }
# /ip firewall address-list
}

or see in github.

Export Data From Cassandra to CSV

Posted on Updated on

Karena kebutuhan untuk mindahin data Cassandra dari cluster lama ke cluster baru maka gue bikin tools untuk ngebantu diri gue sendiri untuk export data cassandra ke CSV.

Updated: The code is pushed to a Github.

Kenapa gue harus bikin tools ini?

Karena Cassandra yg versi gratisan ngga punya tools “bisa jalan” untuk backup data di dalam cassandra.
Well ada sih tools nya beberapa, tapi kalau data lu besar pasti tools tersebut ngadat, makanya kita harus bikin sendiri dengan meraba-raba library connection untuk cassandra dari datastax.

Berhubung skill java gue masih cetek, jadi sorry kalau misalkan agak culun codingan nya. Tapi gue udah test bisa narik tables dengan isi data puluhan GB dan jutaan rows ga ada masalah, so kalau ada yg butuh silahkan di compile aja sendiri pake library java untuk cassandra dari datastax ya.

package lemonade.dumpCassandra;

import java.text.SimpleDateFormat;
import java.util.Iterator;

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.ColumnDefinitions.Definition;
import com.datastax.driver.core.DataType;
import com.datastax.driver.core.ResultSet;
import com.datastax.driver.core.Row;
import com.datastax.driver.core.Session;
import com.datastax.driver.core.SimpleStatement;
import com.datastax.driver.core.Statement;


/**
 * Dump Data from Cassandra to CSV
 * 2015/01/19
 * by sphinxid <firman.gautama@gmail.com>
 *
 */
public class CassExport
{
 public static void main( String[] args )
 {
 String keyspace = "YourKeyspace";
 String table = "TableName";
 String username = "username";
 String password = "password";
 String host = "127.0.0.1";


 Cluster.Builder clusterBuilder = Cluster.builder()
 .addContactPoints(host)
 .withCredentials(username, password);
 Cluster cluster = clusterBuilder.build();
 Session session = cluster.connect(keyspace);

 Statement stmt = new SimpleStatement("SELECT * FROM " + table);
 stmt.setFetchSize(1000);
 ResultSet rs = session.execute(stmt);
 Iterator<Row> iter = rs.iterator();

 while (!rs.isFullyFetched()) {
 rs.fetchMoreResults();
 Row row = iter.next();
 if (row != null)
 {
 StringBuilder line = new StringBuilder();
 for (Definition key : row.getColumnDefinitions().asList())
 {
 String val = myGetValue(key, row);
 line.append("\"");
 line.append(val);
 line.append("\"");
 line.append(",");
 }
 line.deleteCharAt(line.length()-1);
 System.out.println(line.toString());
 }
 }

 session.close();
 cluster.close();

 }

 public static String myGetValue(Definition key, Row row)
 {
 String str = "";

 if (key != null)
 {
 String col = key.getName();

 try
 {
 if (key.getType() == DataType.cdouble())
 {
 str = new Double(row.getDouble(col)).toString();
 }
 else if (key.getType() == DataType.cint())
 {
 str = new Integer(row.getInt(col)).toString();
 }
 else if (key.getType() == DataType.uuid())
 {
 str = row.getUUID(col).toString();
 }
 else if (key.getType() == DataType.cfloat())
 {
 str = new Float(row.getFloat(col)).toString();
 }
 else if (key.getType() == DataType.timestamp())
 {
 str = row.getDate(col).toString();

 SimpleDateFormat fmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ssZ");
 str = fmt.format(row.getDate(col));


 }
 else
 {
 str = row.getString(col);
 }
 } catch (Exception e)
 {
 str = "";
 }
 }

 return str;
 }

}

or this is the pastebin.

#Update

#Benchmark Speed
22gb of data ~ 122mil rows. 
Extracted in 444m38.061s. 
- 1 host cassandra server (4core, 8gb ram, sata hdd).
- avg 4.5k of rows / second.

Simple Article Spinner/reWriter Bahasa Indonesia

Posted on Updated on

Semalem gue lagi banyak pikiran terus ngga bisa tidur, jadi gue iseng2 aja coding bikin a very basic article spinner/rewriter bahasa indonesia.
Karena ini hanya very dirty code, sorry kalau codinganya berantakan 😉

Oh iya free article spinner/rewriter ini gue namain “LunaMaya” =))

source code diambil di github gua -> https://github.com/sphinxid/LunaMaya

untuk demo bisa dilihat disini -> http://sphinx.cloudapp.net/LunaMaya/

Free Article Spinner/reWriter Bahasa Indonesia

LemonScrap – PHP Class Untuk Scraping Website

Posted on Updated on

Sudah hampir 2 Tahun gue ngga pernah posting blog lagi, namun mulai hari ini kayaknya gue InsyaAllah bakalan mulai nulis2 ngga jelas lagi.

Oh iya, waktu dulu gue pernah bikin tutorial tentang Basic Website Scraping dengan PHP dan cURL, nah kali ini gue mau release wrappernya untuk melakukan hal tersebut, namanya LemonScrap dan bisa di download sourcenya di sini.

Apaan sih lemonScrap?

lemonScrap

lemonScrap is a PHP class to help basic web scraping easier by just writing some rules (at the current moment) either with regex or DOMpath.

You are free to use this class for whatever you like. The example of usage are given. in ls-*.php files.

It uses:

  • cURL (to connect to http)
  • regex (for rules parsing and modify the result)
  • DOMpath (for rules parsing)