Attempting to create a scaled-down approach to reading in a LARGE (>150GB) CSV file into an R script by chunking it into smaller bits which can be read in sequentially.
Problem is, one of the column variables is kind of a nested cell, similar to:
ID1,var1,var2,var3,var4,var5,"[[intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6]]"
ID2,var1,var2,var3,var4,var5,"[[intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6]]"
ID3,var1,var2,var3,var4,var5,"[[intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6]]"
I've had some success getting just the nested cell values by running:
cat file.csv | cut -d'"' -f2
But this generates ALL the values in the final column, and I would like to just be able to sequentially call each occurrence (e.g. everything between "[[ and ]]" for one ID) and flatten it into a row/vector of some kind in a single file via >>
I tried variations of this solution: How to print a single cell in a csv file using bash script or awk
But it looks like there are some returns in there that are preventing it from being called correctly (as it all comes up with either the first line via head or just blanks).
I'm sure there's a sed, awk, or grep call that can handle this but I'm drawing a blank.
Edit: It has been brought to my attention that it is unclear what I'm asking for, the short answer is I want to extract everything between the two " for a single line/entry in the CSV.
So that I can pipe an out put like:
ID3,var1,var2,var3,var4,var5,"[[intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6],[intvar6,intvar6,intvar6,intvar6]]"
to it's own file with a name containing a variation of ID3 in the name.