I have over 100,000 csv files in the below format:
1,1,5,1,1,1,0,0,6,6,1,1,1,0,1,0,13,4,7,8,18,20,,,,,,,,,,,,,,,,,,,,,,
1,1,5,1,1,1,0,1,6,5,1,1,1,0,1,0,4,7,8,18,20,,,,,,,,,,,,,,,,,,,,,,,
1,1,5,1,1,1,0,2,6,5,1,1,1,0,1,0,4,7,8,18,20,,,,,,,,,,,,,,,,,,,,,,,
1,1,5,1,1,1,0,3,6,5,1,1,1,0,1,0,13,4,7,8,20,,,,,,,,,,,,,,,,,,,,,,,
1,1,5,1,1,1,0,4,6,5,1,1,1,0,1,0,13,4,7,8,20,,,,,,,,,,,,,,,,,,,,,,,
1,1,5,1,1,1,0,5,6,4,1,0,1,0,1,0,4,8,18,20,,,,,,,,,,,,,,,,,,,,,,,,
1,1,5,1,1,1,0,6,6,5,1,1,1,0,1,0,4,7,8,18,20,,,,,,,,,,,,,,,,,,,,,,,
1,1,5,1,1,1,0,7,6,5,1,1,1,0,1,0,13,4,7,8,20,,,,,,,,,,,,,,,,,,,,,,,
1,1,5,1,1,1,0,8,6,5,1,1,1,0,1,0,13,4,7,8,20,,,,,,,,,,,,,,,,,,,,,,,
1,1,5,1,1,2,0,0,12,12,1,2,4,1,1,0,13,4,7,8,18,20,21,25,27,29,31,32,,,,,,,,,,,,,,,,
All I need is field 10 and field 17 onward, field 10 is the counter indicate how many integer stored start from field 17 i.e. what I need is:
6,13,4,7,8,18,20
5,4,7,8,18,20
5,4,7,8,18,20
5,13,4,7,8,20
5,13,4,7,8,20
4,4,8,18,20
5,4,7,8,18,20
5,13,4,7,8,20
5,13,4,7,8,20
12,13,4,7,8,18,20,21,25,27,29,31,32
Max number of integer need to read is 28. I can easily achieve this by Getline in C++, however, from my previous experience, since I need to handle over 100,000 such files and each files may have 300,000~400,000 such lines. Therefore using Getline to read in the data and build a vector> may have serious performance issue for me. I tried to use fscanf to achieve this:
while (!feof(stream)){
fscanf(fstream,"%*d,%*d,%*d,%*d,%*d,%*d,%*d,%*d,%*d,%d",&MyCounter);
fscanf(fstream,"%*d,%*d,%*d,%*d,%*d,%*d"); // skip to column 17
for (int i=0;i<MyCounter;i++){
fscanf(fstream,"%d",&MyIntArr[i]);
}
fscanf(fstream,"%*s"); // to finish the line
}
However, this will call fscanf multiple times and may also create performance issue. Is there any way to read in variable number of integer at 1 call with fscanf ? Or I need to read into a string and then strsep/stoi it ? Compare to fscanf, which is better from performance point of view?