Read certain columns if exist using read.csv.sql from sqldf

Question

I have very large csv files (2.3 GB). I only want to read certain columns that could be or could not be there.

I am using the following code that was suggested here Only read limited number of columns

library(sqldf) 
loc <- read.csv.sql("data.csv",
                    sql = "select locID, City, CRESTA, Latitude, Longitude from file",
                    sep = ",")

How can I deal with the situation when for example the column "City" is not in the csv?

Remove it from the `sql` statement? Or are you referring to how to detect that this column is missing? — Roman Luštrik, Jul 16 '18 at 18:33

G. Grothendieck · Accepted Answer · 2018-07-16T23:43:04.083

1

This finds out which columns are available, intersects their names with the names of the columns that are wanted and only reads those.

library(sqldf)

nms_wanted <- c("locID", "City", "CRESTA", "Latitude", "Longitude")
nms_avail <- names(read.csv("data.csv", nrows = 0))
nms <- intersect(nms_avail, nms_wanted)
fn$read.csv.sql("data.csv", "select `toString(nms)` from file")

edited Jul 16 '18 at 23:43

answered Jul 16 '18 at 23:01

G. Grothendieck

254,981
17
203
341

Read certain columns if exist using read.csv.sql from sqldf

1 Answers1