-2

i'm trying to extract some informations from a txt file, but after split a line i can only access de first position of the list. Anyone have a ideia why?

My txt file is like:

<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="osmconvert 0.8.4" timestamp="2017-03-06T01:59:59Z">
    <bounds minlat="43.48" minlon="-79.7899999" maxlat="43.92" maxlon="-78.9999997"/>
    <node id="699540" lat="43.6751621" lon="-79.361332" version="1"/>
    <node id="699569" lat="43.7247576" lon="-79.3302633" version="1"/>
    <node id="1497736" lat="43.731285" lon="-79.3304523" version="1"/>
    <node id="1497764" lat="43.7412456" lon="-79.332082" version="1"/>
    <node id="1497766" lat="43.7418685" lon="-79.3321184" version="1"/>
    <node id="1497768" lat="43.7450436" lon="-79.3327357" version="1"/>
    <node id="1497773" lat="43.7459924" lon="-79.3329589" version="1"/>
    <node id="1497776" lat="43.747316" lon="-79.3332228" version="1"/>
    <node id="1497778" lat="43.7484115" lon="-79.3333255" version="1"/>

and my code:

import re
contador = 0

pattern = re.compile("node")

with open('toronto1.txt') as text:
print("leu Arquivo")
with open('saida.txt', 'w') as saida:
    print("criou arquivo")
    for text_line in text:
        comparacao = re.search(pattern, text_line)
        if comparacao is not None:
            node_line = text_line
            split_id = re.findall(r"[\w']+", node_line)
            saida.write(split_id[2]+'\n')           
            contador = contador + 1
    print contador
    saida.close()
text.close()

a = split_id[2]

print node_line
print split_id
print a

thx for the help.

LucasD
  • 1
  • 1
  • 1
    Fix your indentation, please. – juanpa.arrivillaga Mar 14 '17 at 21:24
  • 1
    Probably because the list returned by `re.findall` is only finding a single match, so your list will only have a single element. I am unsure what you are expecting... – juanpa.arrivillaga Mar 14 '17 at 21:25
  • 1
    Stop trying to parse XML with a regex and use a DOM parser instead. See [this post](http://stackoverflow.com/a/1732454/62576). – Ken White Mar 14 '17 at 21:30
  • @juanpa.arrivillaga I don't understand the edit history. It looks like you've sprung an XML into life? What am I missing? – roganjosh Mar 14 '17 at 21:30
  • @roganjosh I had to wrap it in "code" or else it was being interpreted as HTML to do actual formatting. – juanpa.arrivillaga Mar 14 '17 at 21:31
  • First time using stackoverflow, sorry about indentation, i edited. Looking for DIM parcer. thx you all – LucasD Mar 14 '17 at 21:33
  • @juanpa.arrivillaga Thanks, I did not know that. Looking at the initial question, I though it had been forgotten to be included. – roganjosh Mar 14 '17 at 21:33
  • @juanpa.arrivillaga this is my list after de re.findall: ['node', 'id', '365507731', 'lat', '43', '7209752', 'lon', '79', '5030589', 'version', '1'] – LucasD Mar 14 '17 at 21:44
  • I guess `a = split_id[2]` is only giving you 1497778 which is the last id value, to get all ids print `a = split_id[2]` inside the for loop. cause split_id is changed for every line. –  Mar 14 '17 at 21:46
  • @Mimx yeah, i put this a = split_id[2] to test if the last one is catch, and it is! But if i put it inside de loop, i'll get the same error "index out of range". Do you have any ideia why? – LucasD Mar 14 '17 at 21:53
  • @Mimx thank you so much, it was the only one who really tried to help me. You rock! – LucasD Mar 14 '17 at 22:53
  • @Mimx The identention on this page was a error on the post, my identention on the code is right. Btw, i found the error, when the code find a line like: his send the error "out of range", but the strange is that on the "if comparacao is not None" should ignore that line. Now idk what to do. – LucasD Mar 15 '17 at 00:52

1 Answers1

0

I am pretty sure you are getting a file that has only one big line and you are using the /n to close the loop so you only get that once.

Do as someone else told you and parse it the way it is supposed to.

sebastianf182
  • 9,844
  • 3
  • 34
  • 66