I have a very large CSV file, input.csv
, that looks like this:
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.56, 0.98, 87
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89
I am trying to save the contents (all the columns) of this file based on the URL in the first column into separate files.
So the output for the above snippet should be two files:
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.56, 0.98, 87
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89
and
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89
To split this file based on the first column, I am using awk thus:
awk -F, '{print >> ($1".csv")}' input.csv
However, I am unable to save to any file based on the URL field because of this error:
awk: cmd. line:1: (FILENAME=input.csv FNR=1) fatal: can't redirect to ` https://www.youtube.com/watch?v=9t5V_sMVN5I.csv' (No such file or directory)
Saving a file using the URL-style string as filename is apparently causing some error. The many '/' must be causing the problem in the file path.
Is there any way to save the contents based on column 1 ($1) using awk, but such the output files are named differently, perhaps following a sequence like numbering 1..N? The other option is to replace every URL with some unique identifier and then split on that -- however I have not yet been able to script this up.
Any help would be appreciated!