I often use read.csv function to read large CSV files. The files are without header and therefore by using col.names parameter I define properly the name of the variables in the dataframe that would be created after import.
Today, for the first time, I had to use read.csv.sql which is available in sqldf package. The file to import is very big and I only need certain rows based on a condition in that file. According to the online documentation, the filter has to be defined in the WHERE
clause of the SELECT
statement. Let's say that I have a column in my file (among other columns) which is user_account
and I want to import only rows where the condition user_account = 'Foo'
is satisfied. Therefore, I have to write something like
df <- read.csv.sql(
"my_big_data_file.csv",
sql = "select * from file where user_account = 'Foo'",
header = FALSE,
colClasses = c(... Here I define column types ...),
sep = "|",
eol = "\n"
)
Now the problem is, unlike read.csv
apparently there is no col.names
parameter in read.csv.sql
. And given that my file has no header I don't know how to refer to column names. I get an error message as I have written user_account
in the WHERE
clause of the sql parameter in the above code. R complains that there is no such variable.
So, how can I refer to column names using read.csv.sql
for a CSV file without header and at the same time referring to those column names in my filter? Is this even possible?
Thanks in advance