1

I'm writing a script that will need to take in a large file in the following format:

name1
datadatadatadata1
name2
datadatadatadata2
name3
datadatadatadata3.......

about 150,000 lines, but this varies so I can't just use split -l. I also need to know how many files I am splitting to for downstream purposes. Is there anyway I can split -n and ensure that it will always split an even number of lines into each file?

danimal
  • 11
  • 1
  • I suggest taking a look here: http://stackoverflow.com/questions/3066948/how-to-file-split-at-a-line-number – Rivasa Dec 18 '13 at 23:36
  • What is the `-n` flag for? I don't have such. And I don't understand why can't use the `-l` flag. `split -l 100` will put an even number of lines (= 100) in each outfile, except possibly the last one. For example if an input file has 149 lines, how would like to split it? – janos Dec 18 '13 at 23:38
  • because I need to know how many files I will end up with and the line count varies with input file I dont think split -l will work – danimal Dec 18 '13 at 23:40
  • -n specifies the number of outfiles resulting from the split – danimal Dec 18 '13 at 23:41
  • 2
    If the restriction exists both in the number of files and lines, you should simply count the number of lines in the whole file, and later split it accordingly. – Rubens Dec 18 '13 at 23:42
  • ya, i figured. was just hoping there was a trick out there someone knew about to make things easier. thanks! – danimal Dec 18 '13 at 23:44
  • Do you need to know how many files it is split into, or do you need to set the number? If it is just knowledge you seek, use `-l` and then count the result. – William Pursell Dec 19 '13 at 13:28

1 Answers1

0

Specify the file that you want to split (fname) and the number of files (nfiles) that you want to split it into and this script does the calculations to determine how lines go into each file and then does the split:

#!/bin/sh
nfiles=5
fname="somefile"

totlines="$(wc "$fname" | awk '{print $1}')"
lines_per_file=$(( (totlines+nfiles-1)/nfiles ))
lines_per_file=$(( lines_per_file + lines_per_file % 2 ))
[ $((lines_per_file*(nfiles-1)+2 )) -gt $totlines  ] && { echo Failed ; exit 1 ; }

split -l "$lines_per_file" "$fname"

Note that, mathematically, with a single invocation of split, not all combinations of parameters admit a solution. As an example, suppose fname has 50 lines and you want to split it into 12 files each with an even number of lines. There is no possible solution. At 4 lines per file, split would need 13 files. At 6 lines per file, split would need only 9 files.

John1024
  • 109,961
  • 14
  • 137
  • 171