Using Python 2, I am saving strings from a variable (which is out of an xml tag) and am storing it into a list.
First: the strings contain special character, when I print them they don't correctly show up even that am using encode("ISO-8859-1")
Second: The strings show up each one in a list and I want them to be in the same list
import lxml.objectify
from lxml import etree
import codecs
import xml.etree.cElementTree as ET
file_path = "C:\Users\HP\Downloads\Morphalou-2.0.xml"
for event, elem in ET.iterparse(file_path, events=("start", "end")):
if elem.tag == 'orthography' and event =='start':
data = elem.text
my_list = []
if data is not None :
for i in data.split('\n'):
my_list.append(i.encode("ISO-8859-1"))
print (my_list)
This is what Am getting
['abiotique']
['abiotiques']
[u'abi\xe9tac\xe9e']
[u'abi\xe9tac\xe9e']
[u'abi\xe9tac\xe9es']
[u'abi\xe9tin']
[u'abi\xe9tin']
[u'abi\xe9tins']
[u'abi\xe9tine']
[u'abi\xe9tines']
This is what am expecting:
['abiotique','abiotiques','abiétacée',...]
Does anyone know how to fix this ? Thanks