0

I'm trying to turn this string into a list:

f = open( "animals.txt", "r")
g = f.read()
g1 = g.split(",")
print g1 # list of words

I'm getting:

['\x93SHEEP\x94', '\x94TIGER\x94', '\x94LION\x94', '\x94DEER\x94',
'\x94PIG\x94', '\x94DOG\x94', '\x94CAT\x94', '\x94SHARK\x94',
'\x94RAT\x94', '\x94EEL\x94']

What I want is:

['SHEEP', 'TIGER', 'LION', 'DEER', 'PIG', 'DOG', 'CAT', 'SHARK', 'RAT', 'EEL']

How can I do this?

Malik Brahimi
  • 16,341
  • 7
  • 39
  • 70
David Washington
  • 89
  • 1
  • 1
  • 4

3 Answers3

3

You can use encode('ascii','ignore') to remove unicodes , but note that first you need to clarify for python that your strings are unicode you can do it with decode('unicode_escape') :

>>> l
['\x93SHEEP\x94', '\x94TIGER\x94', '\x94LION\x94', '\x94DEER\x94', '\x94PIG\x94', '\x94DOG\x94', '\x94CAT\x94', '\x94SHARK\x94', '\x94RAT\x94', '\x94EEL\x94']
>>> [i.decode('unicode_escape').encode('ascii','ignore') for i in l]
['SHEEP', 'TIGER', 'LION', 'DEER', 'PIG', 'DOG', 'CAT', 'SHARK', 'RAT', 'EEL']
Adam Smith
  • 52,157
  • 12
  • 73
  • 112
Mazdak
  • 105,000
  • 18
  • 159
  • 188
0

Try putting this on the top of your code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
taesu
  • 4,482
  • 4
  • 23
  • 41
0

Try escaping your strings with:

g.decode("unicode-escape")

or:

for i in range(0,len(g1)):
    g1[i] = g1[i].decode("unicode-escape")

This is assuming g1 is the array containing the strings and g is the variable containing the whole file as a string.

I got my answer from:

Python: Sanitize a string for unicode?

Community
  • 1
  • 1
jkd
  • 1,045
  • 1
  • 11
  • 27