0

I'm trying to split up a very large text file into multiple smaller ones. When I run the code below, the first created file is correct. Everything after that just contains the 'INSERT INTO ...' string and nothing else. Thanks in advance

import math
interval = 100000

with open('my-big-file','r') as c:
    for i, l in enumerate(c):
        pass
    length = i + 1

    numOfFiles = int(math.ceil(length / interval))

with open('my-big-file','r') as c:
    for j in range(0, numOfFiles):
        with open('my-smaller-file_{}.sql'.format(j),'w') as n:
            print >> n, 'INSERT INTO codes (code, some-field, some-other-field) VALUES'
            for i, line in enumerate(c):
                if i >= j * interval and i < (j + 1) * interval:
                    line = line.rstrip()
                    if not line: continue

                    print >> n, '(%s,'something','something else'),' % (line)

                else:
                    break
Rory Daulton
  • 21,934
  • 6
  • 42
  • 50
knnnrd
  • 3
  • 1

1 Answers1

0

You don't need to count the number of lines before iterating the file, you can directly write to a new file whenever you reach the number of given lines:

#!/usr/bin/env python

def split(fn, num=1000, suffix="_%03d"):
    import os

    full, ext = os.path.splitext(fn)

    with open(fn, 'r') as f:
        for i, l in enumerate(f):
            if i%num == 0:
                try:
                    out.close()
                except UnboundLocalError:
                    pass
                out = open(full+suffix%(i/num)+ext, 'w')
            out.write(l)
        else:
            out.close()


if __name__ == '__main__':
    import sys
    split(sys.argv[1])

You can run this from the command line. Though probably the split command is more useful, since it supports a multitude of options.

It's also possible to rewrite this code to also use with for the file(s) being written to, but that's another topic.

Community
  • 1
  • 1
Jan Christoph Terasa
  • 5,781
  • 24
  • 34