2

Is it possible to split a csv file, vertically, into multiple files? I know we can split single large files into smaller files with no of rows mentioned using the command line. I have csv files in which columns are repeating after certain column no and I want to split that file column-wise.Is that possible with the command line, If not then how can we do it with python?

For Eg. enter image description here

consider above sample in which site and address present multiple times vertically, I want to create 3 different csv files containing single site and single address

Any help would be highly appreciated,

Thanks

Amit Sharma
  • 2,297
  • 3
  • 19
  • 25
  • 1
    you might be interested in taking a look at the `cut` command – Moinuddin Quadri Jan 07 '18 at 07:56
  • 1
    indeed [cut](https://en.wikipedia.org/wiki/Cut_(Unix)#Examples) is your friend. Please provide an actual input example if you want an adapted solution – ishahak Jan 07 '18 at 08:24
  • @ishahak, I just edited the question, So basically I want to convert such files to multiple files and each file should contain only 2 rows i.e. site and address – Amit Sharma Jan 07 '18 at 08:49
  • [Split dataframe into multiple output files](https://stackoverflow.com/questions/10002021/split-dataframe-into-multiple-output-files) –  Jan 07 '18 at 10:47
  • If any of the answers helped you, please accept it. It is a good habit... – ishahak Jan 08 '18 at 03:43
  • 1
    @ishahak, I have this good habit, Thanks :) – Amit Sharma Jan 08 '18 at 08:35

3 Answers3

3

Assuming your input files is named ~/Downloads/sites.csv and looks like this:

Google,google.com,Google,google.com,Google,google.com
MS,microsoft.com,MS,microsoft.com,MS,microsoft.com
Apple,apple.com,Apple,apple.com,Apple,apple.com

You can use cut to create 3 files, each containing one pair of company/site:

cut -d "," -f 1-2 < ~/Downloads/sites.csv > file1.csv
cut -d "," -f 3-4 < ~/Downloads/sites.csv > file2.csv
cut -d "," -f 5-6 < ~/Downloads/sites.csv > file3.csv

Explanation:

For the cut command, we declare the comma (,) as a separator, which splits every line into a set for 'fields'.

We then specify for each output file, which fields we want to be included.

HTH!

ishahak
  • 6,585
  • 5
  • 38
  • 56
0

If the site-address pairs are regularly repeated, how about:

awk '{
    n = split($0, ary, ",");
    for (i = 1; i <= n; i += 2) {
        j = (i + 1) / 2;
        print ary[i] "," ary[i+1] >> "file" j ".csv";
    }
}' input.csv
tshiono
  • 21,248
  • 2
  • 14
  • 22
0

The following script produces what you want (based on the SO answer adjusted for your needs: number of columns, field separator). It splits the original file vertically into 2 column chunks (note n=2) and creates 3 different files (tmp.examples.1, tmp.examples.2, tmp.examples.3 or whatever you specify for the f variable):

awk -F "," -v f="tmp.examples" '{for (i=1; i<=NF; i++) printf (i%n==0||i==NF)?$i RS:$i FS > f "." int((i-1)/n+1) }' n=2 example.txt

If your example.txt file has the subsequent data:

site,address,site,address,site,address
Google,google.com,MS,microsoft.com,Apple,apple.com
marcell
  • 1,498
  • 1
  • 10
  • 22