1

Hi Stackoverflow Community,

I have a big csv file, basically too big to fit in my computers memory. Therefore, I want to read only the necessary data from the csv file. For example:

Column_A   Column_B   Column_C
 Jan         1         2018
 Jan         4         2019
 Feb         5         2018
 Mar         3         2018

Let's say that I only have an interest in Column_A == 'Jan' and Column_C == "2018".

Is it possible to load only the data where Column_A == "Jan" and Column_C == 2018 (so in this example, only row 1 should be returned).

I found another solution, but unfortunately this was "in memory" (data handling after it was loaded into R):

impordata <- read.csv("big_file.csv")
impordata <- subset(impordata,Column_C ==2018 & Column_A =="Jan")
R overflow
  • 1,292
  • 2
  • 17
  • 37

1 Answers1

1

Try the sqldf package:

For example,

# install.packages('sqldf') #if need be

library(sqldf)
fileCSV <- file('path to csv')
sqldf('select * from fileCSV where Column_A = 'Jan' and Column_C = "2018",
file.format = list(header = TRUE, sep = ',')
)
CT Hall
  • 667
  • 1
  • 6
  • 27