I would like to aggregate data from one CSV file into another, using command-line tools such as bash
, sed
, awk
etc. Each row in each file begins with a unique ID in column 1, and if a row in the destination matches this ID, the line should be replaced, otherwise it should be appended. The input data is not sorted, but the sort order of the result doesn't matter so files can be sorted as part of the script if it helps.
For example, given the current state of aggregate.csv
:
1,olddata
3,olddata
2,olddata
And the file new.csv
:
4,newdata
2,changeddata
3,changeddata
I would like aggregate.csv
to come out as follows (in any sort order):
1,olddata
2,changeddata
3,changeddata
4,newdata
The lines may contain a large number of columns so replacing one cell at a time is not ideal. The CSVs are guaranteed not to contain quoted newlines, so a line-by-line search and replacing a whole line at a time is a valid approach.
Apologies if this is a duplicate but I can't find another question that exactly uses this CSV merging approach.
I have tried adapting the answers to this question but it required first generating the "template file" of all IDs by parsing both files line-by-line, sorting, removing duplicates, and saving - I hoped that a simpler method was possible.
This question has answers in sed and awk that I also replicated, and managed the regex-replacement part but not a method to append a new line to the file if a match did not exist.
Thanks!