1

I have a text file that contains the following contents. I want to split this file into multiple files (1.txt, 2.txt, 3.txt...). Each a new output file will be as the following. The code I tried doesn't split the input file properly. How can I split the input file into multiple files?

My code:

#!/usr/bin/python

with open("input.txt", "r") as f:
    a1=[]
    a2=[]
    a3=[]
    for line in f:
        if not line.strip() or line.startswith('A') or line.startswith('$$'): continue
        row = line.split()
        a1.append(str(row[0]))
        a2.append(float(row[1]))
        a3.append(float(row[2]))
f = open('1.txt','a')
f = open('2.txt','a')
f = open('3.txt','a')
f.write(str(a1)) 
f.close()

Input file:

A
x
k
..
$$

A
z
m
..
$$

A
B
l
..
$$

Desired output 1.txt

A
x
k
..
$$

Desired output 2.txt

A
z
m
..
$$

Desired output 3.txt

A
B
l
..
$$
erhan
  • 317
  • 1
  • 6
  • 16

6 Answers6

3

Read your input file and write to an output each time you find a "$$" and increase the counter of output files, code :

with open("input.txt", "r") as f:
    buff = []
    i = 1
    for line in f:
        if line.strip():  #skips the empty lines
           buff.append(line)
        if line.strip() == "$$":
           output = open('%d.txt' % i,'w')
           output.write(''.join(buff))
           output.close()
           i+=1
           buff = [] #buffer reset

EDIT: should be efficient too https://wiki.python.org/moin/PythonSpeed/PerformanceTips#String_Concatenation

maazza
  • 7,016
  • 15
  • 63
  • 96
  • .@maazza your code gives this error: `Traceback (most recent call last): File "split.py", line 8, in buff.append(line) AttributeError: 'str' object has no attribute 'append` – erhan Mar 10 '16 at 17:04
1

try re.findall() function:

import re

with open('input.txt', 'r') as f:
    data = f.read()

found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)

[open(str(i)+'.txt', 'w').write(found[i-1]) for i in range(1, len(found)+1)]

Minimalistic approach for the first 3 occurrences:

import re

found = re.findall(r'\n*(A.*?\n\$\$)\n*', open('input.txt', 'r').read(), re.M | re.S)

[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found[:3]]

Some explanations:

found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)

will find all occurrences matching the specified RegEx and will put them into the list, called found

[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found]

iterate (using list comprehensions) through all elements belonging to found list and for each element create text file (which is called like "index of the element + 1.txt") and write that element (occurrence) to that file.

Another version, without RegEx's:

blocks_to_read = 3
blk_begin = 'A'
blk_end = '$$'

with open('35916503.txt', 'r') as f:
    fn = 1
    data = []
    write_block = False
    for line in f:
        if fn > blocks_to_read:
            break 
        line = line.strip()
        if line == blk_begin:
            write_block = True
        if write_block:
            data.append(line)
        if line == blk_end:
            write_block = False
            with open(str(fn) + '.txt', 'w') as fout:
                fout.write('\n'.join(data))
                data = []
            fn += 1

PS i, personally, don't like this version and i would use the one using RegEx

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • The regular expression you used is too restrictive. We can't say for sure that that format will hold for all inputs. – Chuck Mar 10 '16 at 12:46
  • 1
    @ChuckLoganLim, i think OP might have '\n's in text-blocks – MaxU - stand with Ukraine Mar 10 '16 at 12:48
  • @MaxU Your code works properly. Can you please explain what each line does? And I think you can write another code without re.findall() function for the same aim? I wouldn't like to use re.findall() function :) – erhan Mar 10 '16 at 17:17
  • @MaxU And how can you arrange your code to get only three output files? Is that possible? Thanks. – erhan Mar 10 '16 at 17:25
  • @erhan, then you would have to explain how do you want to write, for example, 5 founded occurrences into 5 files - what is the rule? – MaxU - stand with Ukraine Mar 10 '16 at 17:27
  • @MaxU I would like to have only three output files without using re.findall() function. That function makes me so confused. I hope you can help me? – erhan Mar 10 '16 at 17:50
  • @erhan, you didn't explain how would you like to distribute 4-5 blocks into 3 files or you are absolutely sure that you will always have only 3 blocks in your input file? – MaxU - stand with Ukraine Mar 10 '16 at 17:58
  • @MaxU I would like to distribute one block into each output file. I have more than 100 blocks but I need to first three blocks. – erhan Mar 10 '16 at 18:00
  • @erhan, so block#4 would go to file#1, block#5 will go to file#2, etc. ? – MaxU - stand with Ukraine Mar 10 '16 at 18:02
  • @MaxU No, block#1 would go to file#1, block#2 would go to file#2 and block#3 would go to file#3. – erhan Mar 10 '16 at 18:04
  • 1
    @erhan, I've updated my "re.findall()" answer, so that it will write only first 3 blocks and will add another version without RegEx's bit later... – MaxU - stand with Ukraine Mar 10 '16 at 18:19
  • @erhan @MaxU this adds a new line at the beginning of the second and third files and what is the purpose of the `write_blk` ? – maazza Mar 10 '16 at 22:11
0

Looks to me that the condition that you should be checking for is a line that contains just the carriage return (\n) character. When you encounter such a line, write the contents of the parsed file so far, close the file, and open another one for writing.

Chuck
  • 866
  • 6
  • 17
0

open 1.txt in the beginning for writing. Write each line to the current output file. Additionally, if line.strip() == '$$', close the old file and open a new one for writing.

0

The blocks are divided by empty lines. Try this:

import sys

lines = [line for line in sys.stdin.readlines()]
i = 1
o = open("1{}.txt".format(i), "w")
for line in lines:
    if len(line.strip()) == 0:
        o.close()
        i = i + 1
        o = open("{}.txt".format(i), "w")
    else:
        o.write(line)
0

A very easy way would if you want to split it in 2 files for example:

with open("myInputFile.txt",'r') as file:
    lines = file.readlines()

with open("OutputFile1.txt",'w') as file:
    for line in lines[:int(len(lines)/2)]:
        file.write(line)

with open("OutputFile2.txt",'w') as file:
    for line in lines[int(len(lines)/2):]:
        file.write(line)

making that dynamic would be:

with open("inputFile.txt",'r') as file:
    lines = file.readlines()

Batch = 10
end = 0
for i in range(1,Batch + 1):
    if i == 1:
        start = 0
    increase = int(len(lines)/Batch)
    end = end + increase
    with open("splitText_" + str(i) + ".txt",'w') as file:
        for line in lines[start:end]:
            file.write(line)
    
    start = end
Enrique Benito Casado
  • 1,914
  • 1
  • 20
  • 40