How reading file in 'utf-8'

Question

I have a txt file, that for each line contains a last name, some last names have a special letter 'Ñ'

Apellidos200.txt

 Ramos      
 Rios       
 Arias      
 Muñoz

To parse and read this file I use this code.

apellidos_list = list()
with io.open('Apellidos200.txt',encoding='utf-8') as fp:
     for line in fp:
         x = line.replace('\t','')
         x = x.replace('\'', '')# I try this
         x = x.replace('\n','')
         x = x.replace('\r','')
         x = x.replace('\\','')
         x = x.replace('"','') # And try this
         apellidos_list.append(repr(x))

Output:

     'Ramos'        
     'Rios'     
     'Arias'        
     'Muñoz'

The problem is that the strings are passed with simple quotes that I cannot remove, I guess that is for the encoding in 'utf-8'

I use this string to concatenate and make an url e.g example.com/Ramos, but with this simple quotes it remains, this way -> example.com/'Ramos' and this cause an error when I use 'request.get'

Edit: Add a image with debug of code. Image of debug

Don't use `repr()` then. Why did you add that in the first place? — Martijn Pieters, Nov 29 '17 at 15:28
Just to be clear: this has **nothing** to do with reading data; you **add** the quotes by using `repr()`. — Martijn Pieters, Nov 29 '17 at 15:29
I don't see any backslashes in your input data. The remaning `str.replace()` calls can all be replaced with a single `str.strip()` call. You can replace the entire loop with `apellidos_list = [line.strip() for line in fp]`. — Martijn Pieters, Nov 29 '17 at 15:31
Thanks for your response @MartijnPieters, I saw this way (repr) of this response [link] (https://stackoverflow.com/a/147756/5280246). However, I deleted repr, but the problem is before that, when the for loop starts. — Alejoo, Nov 30 '17 at 03:33
@MartijnPieters, I added a screenshot when I debug, as you can see, i remove backslashes because they appear at the beginning. — Alejoo, Nov 30 '17 at 03:35
There are no backslashes *at all* in your screenshot. You have a string value (denoted by the quotes, these are *not part of the value* they just denote the type of object) with two tab characters at the end. They are not backslashes and `t` characters. All you need to use is `line = line.strip()`. — Martijn Pieters, Nov 30 '17 at 07:31
You seem do be getting confused by the Python `str` object `repr()` result. Python gives you the object in a way you can copy and paste into another Python program, for ease of debugging. In this format non-printable characters are given as escape sequences. So the tab characters are shown as `\t` escape sequences. — Martijn Pieters, Nov 30 '17 at 07:32
Oh thanks!!! That was, I deleted ** repr () ** and the list was stored well. Thanks @MartijnPieters — Alejoo, Nov 30 '17 at 10:29

score 0 · Answer 1 · answered Nov 30 '17 at 11:00

You are storing the representation of your strings. repr() is a debugging tool, and outputs a valid Python expression to reproduce your string. So you get a string that contains a valid Python string literal, with any non-printable non-ASCII characters replaced with escape sequences (which always start with \ followed by a single character, or x plus 2 hex characters, u with 4 hex characters or U with 8, depending on the codepoint).

Don't use repr(). All you have is strings with some whitespace (tabs and newlines), so str.strip() is all you need:

apellidos_list = []
with io.open('Apellidos200.txt',encoding='utf-8') as fp:
    for line in fp:
        apellidos_list.append(line.strip())

or use a list comprehension:

with io.open('Apellidos200.txt',encoding='utf-8') as fp:
    apellidos_list = [line.strip() for line in fp]

How reading file in 'utf-8'

1 Answers1