I have to process large CSV files (up to tens of GB), that looks like this:
Key,CompletedA,CompletedB
1,true,NULL
2,true,NULL
3,false,NULL
1,NULL,true
2,NULL,true
I have a parser that yields parsed lines as IEnumerable<Record>
, so that I reads only one line at a time into memory.
Now I have to group records by Key and check whether columns CompletedA and CompletedB have value within the group. On the output I need records, that does not have both CompletedA,CompletedB within the group.
In this case it is record with key 3.
However, there is many similar processings going on the same dataset and I don't wont to iterate over it multiple times.
I think I can convert IEnumerable into IObservable and use Reactive Extentions to find the records.
Is it possible to do it in memory efficient way with simple Linq expression over the IObservable collection?