I am trying to do a rather simple task in R - from a large .tsv file (4GB) that I wasn't able to read into memory I would like to read only rows with row names defined in a list. Row names are defined in the first column of the large file called "PMID" and I have a list of PMIDs that I would like to extract from the large file.
I am quite new to R and even though I can use the function match or %in% on a loaded file in R I have trouble to do the same by reading the .tsv file. I used read.table to load the large file but got an error "cannot allocate vector of size 250.0 Mb".
The large .tsv file is structured as follows:
"PMID" "au_order" "lastname" "firstname" "year" "journal type" "city" "state" "country" "lat" "lon" "fips"
26151967 1 Lau Ying 2016 J Hum Lact EDU Queenstown, Singapore - Singapore 1.299 103.787 NULL
26151969 2 Htun Tha Pyai 2016 J Hum Lact EDU Queenstown, Singapore - Singapore 1.299 103.787 NULL
26151965 3 Lim Peng Im 2016 J Hum Lact EDU-HOS Queenstown, Singapore - Singapore 1.299 103.787 NULL
My_vector = c("26151969","26151965")
Output:
"PMID" "au_order" "lastname" "firstname" "year" "journal type" "city" "state" "country" "lat" "lon" "fips"
26151969 2 Htun Tha Pyai 2016 J Hum Lact EDU Queenstown, Singapore - Singapore 1.299 103.787 NULL
26151965 3 Lim Peng Im 2016 J Hum Lact EDU-HOS Queenstown, Singapore - Singapore 1.299 103.787 NULL
I would be very thankful for any help and I apologize if this is a duplicate but even after long search I could not find any answer that I would understand so far.