Questions tagged [ff]

An R package that provides memory-efficient storage of large data on disk and fast access functions

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory.

More information:

165 questions
10
votes
3 answers

Subsetting ffdf objects in R

I'm using R's ff package and I've got some ffdf objects (dimensions around 1.5M x 80) that I need to work with. I'm having some trouble getting my head around the efficient slicing/dicing operations though. For instance I've got two integer columns…
Ken Williams
  • 22,756
  • 10
  • 85
  • 147
10
votes
2 answers

How to deal with a 50GB large csv file in r language?

I am relatively new in the "large data process" in r here, hope to look for some advise about how to deal with 50 GB csv file. The current problem is following: Table is looked like: ID,Address,City,States,... (50 more fields of characteristics of a…
windsound
  • 706
  • 4
  • 9
  • 31
6
votes
2 answers

What is the meaning of this error "Error in if (any(B < 1)) stop("B too small")" while using tabplot package

I found the tabplot package for visualizin a large data base. I ran it using the code below but I get this error on different data frames: "Error in if (any(B < 1)) stop("B too small") : missing value where TRUE/FALSE needed In addition: Warning…
mql4beginner
  • 2,193
  • 5
  • 34
  • 73
6
votes
4 answers

Import text file using ff package

I have a textfile of 4.5 million rows and 90 columns to import into R. Using read.table I get the cannot allocate vector of size... error message so am trying to import using the ff package before subsetting the data to extract the observations…
user2568648
  • 3,001
  • 8
  • 35
  • 52
5
votes
1 answer

Convert an ff object to a data.frame

I am working with big matrix and the ff package. I am loading an ff object and I want to use it to calculate a crps (a score). For example, I have a ff_matrix (called Mat with 25 rows and 7303 columns) which is a precipitation forecast (7303…
Chika
  • 1,455
  • 2
  • 16
  • 24
5
votes
1 answer

delete rows ff package

Since a while now I´ve been using ff package in order to work with big data. The R object I´ve worked with has about 130.000.000 rows and 14 columns. Two of those columns, Temperature and Precipitation have missing values “NA” so I need to delete…
lpchaparro
  • 129
  • 2
  • 6
4
votes
1 answer

Why is my R code for filtering data producing different results with "fread()" and "ffdf()"?

I have a huge file with 7 million records and 160 variables. I came to know that fread() and read.csv.ffdf() are two ways to handle such big data. But when I try to use dplyr to filter these two data sets, I get different results. Below is a small…
3
votes
1 answer

Error while reading large file using ff package

I am trying to read a large file (1.51 GB) using the "ff" package. The following command was used: atmins = read.csv.ffdf(file="atmins.csv", header=TRUE, VERBOSE=TRUE, first.rows=10000, next.rows=50000,…
user3342643
  • 729
  • 1
  • 7
  • 7
3
votes
2 answers

Using apply on large ffdfs

The basic idea is this: I have a large ffdf (about 5.5 million x 136 fields). I know for a fact that some of these columns in this data frame have columns which are all NA. How do I find out which ones and remove them appropriately? My instinct is…
Clarinetist
  • 1,097
  • 18
  • 46
3
votes
1 answer

Grow a ffdf data frame on disk gradually

From documentation of save.ffdf: Using ‘save.ffdf’ automagically sets the ‘finalizer’s of the ‘ff’ vectors to ‘"close"’. This means that the data will be preserved on disk when the object is removed or the R sessions is closed. Data can be…
qed
  • 22,298
  • 21
  • 125
  • 196
3
votes
1 answer

How to delete ffdf objects directories in R?

I am using the ffdf package to do some data pre-processing. My work computer has 4 CPU cores and 8 Gb of RAM, and I can handle about 0.2-0.3 billion data points, which is really wonderful. However, I have another constraint. The large ffdf objects…
3
votes
2 answers

How to convert a factor vector to POSIXct in ff or ffbase

After reading in a large data set with read.csv.ffdf, one of the columns is time. Such as 2014-10-18 00:01:02, for 1 million rows in that column. That column is a factor. How do I convert it to POSIXct supported by ff? Simply using as.POSIXct() just…
MM Cui
  • 51
  • 6
3
votes
1 answer

R could not allocate memory on ff procedure. How come?

I'm working on a 64-bit Windows Server 2008 machine with Intel Xeon processor and 24 GB of RAM. I'm having trouble trying to read a particular TSV (tab-delimited) file of 11 GB (>24 million rows, 20 columns). My usual companion, read.table, has…
Waldir Leoncio
  • 10,853
  • 19
  • 77
  • 107
3
votes
0 answers

How to do matrix multiplication with ff objects

Suppose I have ff_matrix (also doesn't work with ffdf) objects called x and y. x is a 100*10 matrix and y is a 10*1 matrix. library(ffbase) x <- as.ffdf(data.frame(matrix(rnorm(100*10),ncol=10))) y <- as.ffdf(data.frame(matrix(rnorm(10)))) x <-…
user2763361
  • 3,789
  • 11
  • 45
  • 81
3
votes
2 answers

ff package in R: how to move data from one drive to another, and change filenames

I am working intensively with the amazing ff and ffbase package. Due to some technical details, I have to work in my C: drive with my R session. After finishing that, I move the generated files to my P: drive (using cut/paste in windows, NOT using…
Miguel Vazq
  • 1,459
  • 2
  • 15
  • 21
1
2 3
10 11