0

Good day everyone!

I have a huge file:

1| something
2| something else
2| something else 2
2| something else 3
3| something else 4
3| something else 5
5| something else 6
...
28| something else 29

What I need is to split this one file in 28 different files. Like file1 containing everything that starts with 1|, file2 with 2|, etc.

The file is about 400GB. Is there a performant, easy way to do this?

Thanks alot!

edit:

this is what I've done and it takes ages

    for line in r_file:
        var.append(line)
    r_file.close()
    for i in range(1, 29):
        w_file = open('/file' + str(i) + '.txt', 'a', encoding='utf-8')
        for line in var:
            if line.startswith(str(i) + '|'):
                w_file.write(line)
        w_file.close()```
jigga
  • 558
  • 4
  • 16
  • what about just reading a line and writing it to the relevant file in a for loop? – Omer Ben Haim Jul 14 '20 at 14:38
  • I do not think that there is an easy AND performat way to do this. An easy way would be to iterate over it line by line with a filepointer. A "harder" approach would be to split up the file beforehand in several smaller files simply by a quarter etc. into 4 files and then do some multiprocessing approach which would be way harder to implement – Kev1n91 Jul 14 '20 at 14:44
  • https://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python I think this SO thread can help you – Kev1n91 Jul 14 '20 at 14:45

1 Answers1

0

You can create 29 files and copy the lines to the respective file using python.
Files are created automatically by using this below code.
Refer to comments to know the functionality.

Python

file = [None]

# Open files in write mode
for i in range(1, 30):
    file.append(open("file"+str(i)+".txt", "w"))

# Copy required lines to other files
with open('original_file.txt', 'r') as orig:
    for line in orig:
        fno = int(line.split('|')[0])
        if (1<=fno<=28): file[fno].write(line)
        else: file[29].write(line)
        print(fno)

# Close files
for i in range(1, 30):
    file[i].close()

Thank you

Dinesh
  • 812
  • 4
  • 14