2

I have a rather large file (150 million lines of 10 chars). I need to split it in 150 files of 2 million lines, with each output line being alternatively the first 5 characters or the last 5 characters of the source line. I could do this in Perl rather quickly, but I was wondering if there was an easy solution using bash. Any ideas?

Jon Seigel
  • 12,251
  • 8
  • 58
  • 92
Sklivvz
  • 30,601
  • 24
  • 116
  • 172
  • I think you need to be a bit clearer on what the transformation is exactly. (That is, I don't get it.) Perhaps a small example? – mweerden Sep 15 '08 at 15:25

4 Answers4

2

Homework? :-)

I would think that a simple pipe with sed (to split each line into two) and split (to split things up into multiple files) would be enough.

The man command is your friend.


Added after confirmation that it is not homework:

How about

sed 's/\(.....\)\(.....\)/\1\n\2/' input_file | split -l 2000000 - out-prefix-

?

HD.
  • 2,127
  • 1
  • 18
  • 15
  • Great! In the end I used this: for file in *.txt; do echo $file; sed 's/\(.....\)\(.....\)/\1\r\n\2/' $file | split -l 2000000 - $file.part.; done – Sklivvz Sep 15 '08 at 18:11
  • this messed up formatting on a large csv file in the outputs, added new lines where there should't be any. – lacostenycoder Jul 14 '22 at 14:54
0

I think that something like this could work:

out_file=1
out_pairs=0
cat $in_file | while read line; do
    if [ $out_pairs -gt 1000000 ]; then
        out_file=$(($out_file + 1))
        out_pairs=0
    fi
    echo "${line%?????}" >> out${out_file}
    echo "${line#?????}" >> out${out_file}
    out_pairs=$(($out_pairs + 1))
done

Not sure if it's simpler or more efficient than using Perl, though.

che
  • 12,097
  • 7
  • 42
  • 71
0

First five chars of each line variant, assuming that the large file called x.txt, and assuming it's OK to create files in the current directory with names x.txt.* :

split -l 2000000 x.txt x.txt.out && (for splitfile in x.txt.out*; do outfile="${splitfile}.firstfive"; echo "$splitfile -> $outfile"; cut -c 1-5 "$splitfile" > "$outfile"; done)

Troels Arvin
  • 6,238
  • 2
  • 24
  • 27
0

Why not just use native linux split function?

split -d -l 999999 input_filename

this will output new split files with file names like x00 x01 x02...

for more info see the manual

man split
lacostenycoder
  • 10,623
  • 4
  • 31
  • 48