1

I have a very large CSV file, input.csv, that looks like this:

https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.56, 0.98, 87
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89

I am trying to save the contents (all the columns) of this file based on the URL in the first column into separate files.

So the output for the above snippet should be two files:

https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.56, 0.98, 87
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89

and

https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89

To split this file based on the first column, I am using awk thus:

awk -F, '{print >> ($1".csv")}' input.csv

However, I am unable to save to any file based on the URL field because of this error:

awk: cmd. line:1: (FILENAME=input.csv FNR=1) fatal: can't redirect to `    https://www.youtube.com/watch?v=9t5V_sMVN5I.csv' (No such file or directory)

Saving a file using the URL-style string as filename is apparently causing some error. The many '/' must be causing the problem in the file path.

Is there any way to save the contents based on column 1 ($1) using awk, but such the output files are named differently, perhaps following a sequence like numbering 1..N? The other option is to replace every URL with some unique identifier and then split on that -- however I have not yet been able to script this up.

Any help would be appreciated!

AruniRC
  • 5,070
  • 7
  • 43
  • 73
  • @Sundeep perfect! I had no idea how to split the string nested within the awk command. Please add this as an answer so I can accept it! – AruniRC Oct 28 '16 at 05:59

2 Answers2

1

Since the first column has regular format with string after = serving as unique identifier, we can use that

awk -F, '{split($1,a,"="); print > (a[2]".csv")}' input.csv

$ cat b7kKTSVbfdA.csv
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89

$ cat 9t5V_sMVN5I.csv
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.56, 0.98, 87
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89

Reference:

Sundeep
  • 23,246
  • 2
  • 28
  • 103
0

because your filename contains '/' character, you can use this method blow:

awk -F, '{filename=$1;sub(".*=","",filename);print >> (filename".csv")}' input.csv   
wang sky
  • 131
  • 5