Split a large txt file into small ones but not all the data is read using python

Question

There is a .txt file with data like following:

1 00000001.setts 0x 
2 00000002.setts 0x 
3 00000003.setts 0x 
4 00000004.setts 0x 
...
59876 0000e9e4.setts 0x 
59877 0000e9e5.setts 0x 
59878 0000e9e6.setts 0x

the number of strings are always dynamic and far from ending in a round number (there are about 100k of them). How is it possible to implement in the form of code the division of such a large file with strings into smaller files (1500 for each small file) of the txt format?

I should clarify that I tried to implement this task, but unfortunately I encounter the fact that it doesn't read everything and some of the data is lost ( Read only 59514 strings out of 59878 )

The file is pretty big, the structure consists of two values, followed by a space and /n

Specify what is the issue. You want to split the large text file into smaller text files or your issue is all the lines are not being read by python. Everyone is answering according to the question you asked and you are commenting issue is regarding no. of lines being read. — GodWin1100, Oct 26 '22 at 11:26

score 1 · Answer 1 · answered Oct 26 '22 at 10:52

1

welcome to Stack Overflow. This question has already been answered elsewhere, see this post: Splitting large text file into smaller text files by line numbers using Python.

answered Oct 26 '22 at 10:52

mowgle

23
5

I've seen this answer, but it didn't help me, like many other methods (maybe I'm doing something wrong) my txt file consists of 59878 lines, of which it reads only 59514 and I don't really understand what the reason may be – schaef Oct 26 '22 at 10:58

score 0 · Answer 2 · answered Oct 26 '22 at 10:50

Hi and welcome to Stackoverflow. I recommend reading some basic tutorials about Reading and Writing Files in Python. The only thing you need more is a simple for-loop. This is also some basics you should understand.

If you have your own code and concrete questions about it, you can post it here for help.

score 0 · Accepted Answer · answered Oct 26 '22 at 11:05

you can do something like this:

def split_file(filename, filename_prefix, filename_suffix, maximal_filesize):
    line_counter = 1
    file_counter = 1
    with open(filename, "r") as f_in:
        # Read file line by line
        for line in f_in:
            # Increase the counter if there are too many lines in the new file
            if line_counter => maximal_filesize:
                file_counter += 1
                line_counter = 1

            # If you need to do something with data (e.g. remove the number in the beginning), you can add the relevant code here

            # Write data into new file
            with open(f"{filename_prefix}{file_counter}{filename_suffix}", "a") as f_out:
                f_out.write(line + "\n")

            line_counter += 1

# Usage
split_file("long_file.txt", "new_short_file_", ".txt", 1500)

# New files will be named sequentially:
# new_short_file_1.txt
# new_short_file_2.txt
# etc.

This should split your large file into smaller ones that have 1500 lines each.

hi, this code is good, but I have a problem with the fact that it doesn't read all the data again, only 59514 strings out of 59878 :( — schaef, Oct 26 '22 at 11:19
Are the data being written into the file, while the script is running? If yes, than you probably have to modify the script. I suggest this taking a look at [this](https://stackoverflow.com/questions/60655818/read-from-file-while-it-is-being-written-to-in-python). — marek, Oct 26 '22 at 11:33

Split a large txt file into small ones but not all the data is read using python

3 Answers3