-1

I have a very long text file that I want to split into smaller files. It looks like:***200302 abcdfg ***200303 fasafafd ***200304 dajhskjsd

I want that the contents between *** are saved as a new file of the type (1.txt, 2.txt, 3.txt...)

I have tried without success the suggestions posted in another discussion thread (How can I split a text file into multiple text files using python?)

I have also tried to use the code below which showed error.The error is in line 6 (SyntaxError: unexpected character after line continuation character).

with open ('filename.txt','r') as fo:

    op=''
    start=0
    cntr=1
    for x in fo.read().split(*\n*):
        if (x=='***'):
            if (start==1):
                with open (str(cntr)+'.txt','w') as opf:
                    opf.write(op)
                    opf.close()
                    op=''
                    cntr+==1
            else:
                start=1

        else:
            if (op==''):
                op = x
            else:
                op=op + '\n' + x

    fo.close()
Andurush
  • 33
  • 3
  • _I have also tried to use the code below which showed error._ --> Please [edit] your question and add the complete error that you get. Please do this always when you ask such a question. – Ocaso Protal Nov 15 '19 at 09:24

1 Answers1

0

PLEASE! NEXT TIME ADD THE ERRORS THAT YOU GET TO YOUR QUESTION!

First of all, there are two syntax errors in your code:

for x in fo.read().split(*\n*): # It's not *\n* but '\n'!

and

cntr+==1 # It's += !

These are easy to spot when you read the error messages carefully!

When you fix these errors your code will run but it will omit the last line of your file!

I assume that your file looks like this:

***  
200302 abcdfg 
***  
200303 fasafafd  
***
200304 dajhskjsd 

So to get the last line too just add an if at the end (btw: no need for parenthesis in such simple ifs):

with open ('filename.txt','r') as fo:

    op=''
    start=0
    cntr=1
    for x in fo.read().split("\n"):
        if x=='***':
            if start==1:
                with open (str(cntr)+'.txt','w') as opf:
                    opf.write(op)
                    opf.close()
                    op=''
                    cntr+=1
            else:
                start=1

        else:
            if not op:
                op = x
            else:
                op=op + '\n' + x

    if start == 1 and op:
        with open (str(cntr)+'.txt','w') as opf:
            opf.write(op)
            opf.close()
            op=''
            cntr+=1


    fo.close()

This can also be simplified to

with open ('filename.txt','r') as fo:

    start=1
    cntr=0
    for x in fo.read().split("\n"):
        if x=='***':
            start = 1
            cntr += 1
            continue
        with open (str(cntr)+'.txt','a+') as opf:
            if not start:
                x = '\n'+x
            opf.write(x)
            start = 0

No need for .close() when you are using with! And I'm pretty sure that you can simplify this even more.

Ocaso Protal
  • 19,362
  • 8
  • 76
  • 83
  • Thanks for the suggestions. Next time I will surely add the error description in the message. I tried to run the code above and an error appears for line 5 (for x in fo.read()...) – Andurush Nov 16 '19 at 10:22
  • Thanks for the suggestions. Next time I will surely add the error description in the message. I tried to run the code you suggested above and an error appears for line 5 (for x in fo.read()...) The error description is long, but the last line of the error description says: "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 10: invalid start byte". Please let me know if you have any suggestions to solve this. – Andurush Nov 16 '19 at 10:28
  • Of yourse I have a suggestion: Use the correct encoding of your file when you open it, see also https://stackoverflow.com/questions/19699367/unicodedecodeerror-utf-8-codec-cant-decode-byte and lots of other questions for that error here on SO. Or convert your text file to utf-8 if that is possible :) – Ocaso Protal Nov 16 '19 at 13:40