I have been searching for a method to do this efficiently for a while now and can't come up with the best solution.
The requirement is simple. I have a file of the following format.
$cat mymainfile
rec1,345,field3,....field20
rec1,645,field3,....field20
rec12,345,field3,....field20
frec23,45,field3,....field20
rec34,645,field3,....field20
At the end of the split operation, I want to have multiple separate files with these names
$cat some_prefix_345_some_suffix_date
rec1,345,field3,....field20
rec12,345,field3,....field20
$cat some_prefix_645_some_suffix_date
rec1,645,field3,....field20
rec34,645,field3,....field20
$cat some_prefix_45_some_suffix_date
frec23,45,field3,....field20
I thought of using grep, but it has to find unique ids and then grep for each as we don't know the ids (345,645 etc) that are in the file prior to reading the mymainfile
.
Then I thought of csplit
for eg this here Split one file into multiple files based on delimiter but it splits based on the delimiter and not on a specific column.
When it comes to bash scripting, I know I can read line by line using a while loop
and split it but don't know if it's going to be efficient.
I also thought of awk
solutions like awk '$2 == ? {
etc but don't know how to get those different filenames. I may do it programmatically using python but prefer a single command line and I know it's possible. I'm tired of searching and still can't figure out the best approach for this though. Any suggestions / best approach would be much appreciated.