I'm learning R
and I'm a big fan of data.table
package - I like it's database-like syntax and performance.
When I was reading web pages and blogs on data analysis, I found this post:
A Large Data Workflow with Pandas
Data Analysis of 8.2 Million Rows with Python and SQLite
https://plot.ly/ipython-notebooks/big-data-analytics-with-pandas-and-sqlite/
I would like to practice this data analysis with data.table
; however, there's only 4Gb RAM on my laptop:
➜ ~ free -m
total used free shared buff/cache available
Mem: 3686 966 1976 130 743 2359
Swap: 8551 0 8551
➜ ~
The dataset is a 3.9Gb CSV file, my availalbe memory is not enough to read the file as a data.table
. But I'm not willing to give up data.table
package.
Question:
Is there a database interface for
data.table
package? I searched its documentation and have no good luck.If
data.table
is not the right tool for this task, which approach is highly recommended?(1)
sqldf
or (2)sqlite
+dplyr
or (3)ff
/bigmemory
package?
I've noticed that each of above packages has distinctive syntax. The pandas
in the linked post can do almost all these task in one set of tools. Is there possibly a similar approach in R?