Python : split text to list of lines

Question

am new to Python , But i have text File like :

12345 | 6789 | abcd | efgh

i want my output be Like :

12345
6789
abcd
efgh

=====================

i really don't know the script but i made a lot of scripts by those function split() , strip() , blame blame blame

but i failed to do it so am asking for help is someone can .

i will appreciate any Help .

with open('contacts_index1.txt') as f:
    lines = f.read().splitlines("|")

The real code is: `f.readlines()` then loop over them all splitting on '|' — supreme Pooba, Sep 29 '16 at 20:45
`Traceback (most recent call last): File "C:\Users\TOSHIBA\Desktop\findme.py", line 4, in r = f.read() File "C:\Users\TOSHIBA\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 529: character maps to ` — Sameh Weangy, Sep 30 '16 at 02:46

score 1 · Accepted Answer · edited May 23 '17 at 11:48

1

From all of your comments, it looks like the issue has to do with the actual text in the file, and not the ability to parse it. It looks like everyone's solution in here is on the right track, you just need to force the encoding.

The error you are describing is described in this other StackOverflow post.

with open('contacts_index1.txt', 'r') as f:
     lines = f.read().encode("utf-8").replace("|", "\n")

EDIT: The issue appears to be a nasty character that wasn't properly decoding. With open you can tell it to ignore characters it can't decode.

import io 
with io.open("contacts_index1.txt", errors="ignore") as f:
    lines = f.read()replace("|", "\n")

edited May 23 '17 at 11:48

Community

1
1

answered Sep 30 '16 at 03:16

TheF1rstPancake

2,318
17
17

u r right bro . but it still gave me the same error , i don't have no idea . what can i do ! – Sameh Weangy Sep 30 '16 at 03:48
1

i also saved the file as "UTF-8" and the error is still appear – Sameh Weangy Sep 30 '16 at 03:51
Try the solution in that link. The error you show has it loading with encoding "CP1252". So there has to be a way to force it to read with a different encoding (like utf-8). `import io` `io.open('contacts_index1.txt', encoding="utf-8")` OR `io.open('contacts_index1.txt', encoding="latin-1")` – TheF1rstPancake Sep 30 '16 at 03:52
Another option is to tell Python to throw out the character if it can't decode it: `io.open("contacts_index1.txt", errors="ignore")` http://stackoverflow.com/questions/3284827/python-3-chokes-on-cp-1252-ansi-reading – TheF1rstPancake Sep 30 '16 at 03:55

PassionInfinite · Answer 2 · 2016-09-30T03:54:41.177

1

You will have to use decode. The following code will work:

def dataFunction(filename):
    with open(filename, encoding="utf8") as f:
        return f.read()

Call this function with filename as parameter:

Contents = dataFunction(filename)
elements = Contents.split("|")
for element in elements:
         print(element)

edited Sep 30 '16 at 03:54

answered Sep 30 '16 at 03:19

PassionInfinite

640
4
14

am sorry , i didn't understand what exactly suppose i do ? – Sameh Weangy Sep 30 '16 at 03:22
am sorry again , but it gave me that : `Traceback (most recent call last): File "C:\Users\TOSHIBA\Desktop\hope.py", line 4, in Contents = dataFunction("contacts_index1.txt") File "C:\Users\TOSHIBA\Desktop\hope.py", line 3, in dataFunction return f.read().decode('utf-8') File "C:\Users\TOSHIBA\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 529: character maps to ` – Sameh Weangy Sep 30 '16 at 03:31
Check out i have updated! If not works comment...! I would like to help you! – PassionInfinite Sep 30 '16 at 03:44
If it not works check encoding="Latin-1" event that also does not work check the encoding format using online tools and change the second argument it will work...! @SamehWeangy – PassionInfinite Sep 30 '16 at 03:53
i already did , and the upper error which i commented back is appeared , am really appreciate what u trying to do bro . – Sameh Weangy Sep 30 '16 at 03:54
Remove that decode function..! Except is thrown on that decode function..! – PassionInfinite Sep 30 '16 at 03:55
Also if you want to check your file encoding use this https://nlp.fi.muni.cz/projects/chared/ tool . – PassionInfinite Sep 30 '16 at 03:58
Up vote that comment so that users will get to know..! Welcome – PassionInfinite Sep 30 '16 at 04:11
i already did bro , but it says something like less than 15 Reputations cant vote . – Sameh Weangy Sep 30 '16 at 04:15

score 0 · Answer 3 · answered Sep 29 '16 at 20:32

0

Some problems with the code you posted:

f.read doesn't read the whole line. It should be f.readline().
What is the function splitlines?

Your question is pretty unclear in differnt aspects. Maybe this snippet could be of some help:

for line in open('contacts_index1.txt'):
    elements = line.split('|')
    for element in elements:
        print element.strip()

Editted: I didn't know the function splitlines. Just looked it up. The way you used it in your code doesn't seem to be correct anyway.

answered Sep 29 '16 at 20:32

怀春춘

197
5

and your code gave me that error `Traceback (most recent call last): File "C:\Users\TOSHIBA\Desktop\csvfind.py", line 1, in for line in open('contacts_index1.txt'): File "C:\Users\TOSHIBA\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 529: character maps to ` – Sameh Weangy Sep 30 '16 at 02:56
am sorry i meat line.split() – Sameh Weangy Sep 30 '16 at 02:57

score 0 · Answer 4 · answered Sep 29 '16 at 20:34

0

I strongly suggest using csv module for this kind of task, as it seems like a csv-type file, using '|' as delimiter:

import csv
with open('contacts_index1.txt','r') as f:
    reader=csv.reader(f,delimiter='|')
    for row in reader:
        #do things with each line
        print "\n".join(row)

answered Sep 29 '16 at 20:34

Juan Albarracín

123
1
2

i have tried the CSV also but here's the error `Traceback (most recent call last): File "C:\Users\TOSHIBA\Desktop\csvfind.py", line 4, in for row in reader: File "C:\Users\TOSHIBA\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 529: character maps to ` – Sameh Weangy Sep 30 '16 at 02:54

dawg · Answer 5 · 2016-09-30T04:30:37.497

0

Please do this line by line. There is no need to read the entire file at once.

Something like:

with open(file_name) as f_in:
    for line in f_in:
        for word in line.split('|'):
            print word.strip()

If it is a unicode issue, most of the time it is automatic:

$ cat /tmp/so.txt
12345 | 6789 | abcd | éfgh

(note the é in the file)

The program above works. If it does NOT work, use a codec:

with open(fn) as f_in:
    for line in f_in:
        line=line.decode('utf-8')  # or whatever codec is used for that file...
        for word in line.split('|'):
            print word.strip()

With Python3, just set the encoding when you open the file:

with open(fn, encoding='utf-8') as f_in:   # <= replace with the encoding of the file...
    for line in f_in:
        for word in line.split('|'):
            print(word.strip())

edited Sep 30 '16 at 04:30

answered Sep 30 '16 at 03:53

dawg

98,345
23
131
206

@dwg thanks , bro it's worked with one . but the another one have encoding problem . so am still stuck in this problem of decode . – Sameh Weangy Sep 30 '16 at 04:03
If it is an encoding issue, just open with a proper codec. – dawg Sep 30 '16 at 04:16
i made another solution for encoding with ur script , and now it's worked . Thank you bro . but if i need to write the output into new file instead of printing in the shell what should i do , bro ? – Sameh Weangy Sep 30 '16 at 04:22
Python 2 or 3? Makes a difference. Python 2, use `encode` and `decode` on each end as [here](http://stackoverflow.com/a/17246997/298607). Python 3, use the encoder embedded in [open](https://docs.python.org/3/library/functions.html#open) – dawg Sep 30 '16 at 04:23
it's python-3.5.0 – Sameh Weangy Sep 30 '16 at 04:26
Use a proper open statement of encoding. It is transparent at that point – dawg Sep 30 '16 at 04:27

Python : split text to list of lines

5 Answers5