Import a CSV file, based on a specific value

Question

Hi Stackoverflow Community,

I have a big csv file, basically too big to fit in my computers memory. Therefore, I want to read only the necessary data from the csv file. For example:

Column_A   Column_B   Column_C
 Jan         1         2018
 Jan         4         2019
 Feb         5         2018
 Mar         3         2018

Let's say that I only have an interest in Column_A == 'Jan' and Column_C == "2018".

Is it possible to load only the data where Column_A == "Jan" and Column_C == 2018 (so in this example, only row 1 should be returned).

I found another solution, but unfortunately this was "in memory" (data handling after it was loaded into R):

impordata <- read.csv("big_file.csv")
impordata <- subset(impordata,Column_C ==2018 & Column_A =="Jan")

You can use `sqldf` to read in filtered data. See http://jofrhwld.github.io/blog/2014/05/23/using_sqldf.html — Kerry Jackson, Mar 12 '19 at 14:57

CT Hall · Answer 1 · 2019-03-12T15:35:31.087

1

Try the sqldf package:

For example,

# install.packages('sqldf') #if need be

library(sqldf)
fileCSV <- file('path to csv')
sqldf('select * from fileCSV where Column_A = 'Jan' and Column_C = "2018",
file.format = list(header = TRUE, sep = ',')
)

edited Mar 12 '19 at 15:35

answered Mar 12 '19 at 15:01

CT Hall

667
1
6
27

Thanks! But this is only importing specific columns, not based on the values of these columns, right? – R overflow Mar 12 '19 at 15:04
the `c(1,3)` is importing the 1st and 3rd columns, you'll want to change that to the index appropriate to your data. – CT Hall Mar 12 '19 at 15:07
Thanks, that is clear. But let's say that I query Column_A and Column_C (so the c(1,3)), how can I only read in the data where for example Column_A == "Jan" ? – R overflow Mar 12 '19 at 15:12
1

Ah, I misread this a bit, let me check into it. – CT Hall Mar 12 '19 at 15:14
I've updated it to use sqldf – CT Hall Mar 12 '19 at 15:35

Import a CSV file, based on a specific value

1 Answers1