1

I have very large csv files (2.3 GB). I only want to read certain columns that could be or could not be there.

I am using the following code that was suggested here Only read limited number of columns

library(sqldf) 
loc <- read.csv.sql("data.csv",
                    sql = "select locID, City, CRESTA, Latitude, Longitude from file",
                    sep = ",")

How can I deal with the situation when for example the column "City" is not in the csv?

janf
  • 81
  • 11

1 Answers1

1

This finds out which columns are available, intersects their names with the names of the columns that are wanted and only reads those.

library(sqldf)

nms_wanted <- c("locID", "City", "CRESTA", "Latitude", "Longitude")
nms_avail <- names(read.csv("data.csv", nrows = 0))
nms <- intersect(nms_avail, nms_wanted)
fn$read.csv.sql("data.csv", "select `toString(nms)` from file")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341