0

I am trying to figure out why my code behavior differs from normal execution. I have seen this, but it is not my case:

What to do, if debug behaviour differs from normal execution?

python2.7 using debug behave different then without debug

I'm parsing an XML document to a DataFrame, so I can convert into a csv or excel file. With normal execution, it only parses the last "CPE" of the "LOCALIDADE" node.

This is a chunk of my xml file:

<DISTRITO xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <NOME_DISTRITO>BRAGANCA</NOME_DISTRITO>
  
  <CONCELHO>
    <NOME_CONCELHO>ALFANDEGA DA FE</NOME_CONCELHO>
    <FREGUESIA>
      <NOME_FREGUESIA>AGROBOM</NOME_FREGUESIA>
      <LOCALIDADE>
        <NOME_LOCALIDADE>AGROBOM</NOME_LOCALIDADE>
        <CODIGO_POSTAL>5350</CODIGO_POSTAL>
        <CPE>PT2000022152377DE</CPE>
        <CPE>PT2000022152388XX</CPE>
        <CPE>PT2000022152399XK</CPE>
        <CPE>PT2000022152402BR</CPE>
        <CPE>PT2000022152424NT</CPE>
      </LOCALIDADE>
    </FREGUESIA>

    <FREGUESIA>
      <NOME_FREGUESIA>ALFANDEGA DA FE</NOME_FREGUESIA>
      <LOCALIDADE>
        <NOME_LOCALIDADE>ALFANDEGA DA FE</NOME_LOCALIDADE>
        <CODIGO_POSTAL>5350</CODIGO_POSTAL>
        <CPE>PT2000022153052QF</CPE>
        <CPE>PT2000022153085VV</CPE>
        <CPE>PT2000022153108HV</CPE>
        <CPE>PT2000022153119LM</CPE>
      </LOCALIDADE>
    </FREGUESIA>
  </CONCELHO>
</DISTRITO>

This code works for me when I am debugging it:

import xml.etree.ElementTree as et
import pandas as pd

path = '/Path/toFile.xml'
data = []
for (ev,el) in et.iterparse(path):
        print (el.tag, el.text)        
        if el.tag == 'NOME_DISTRITO': nome = el.text 
        if el.tag == 'NOME_CONCELHO': nc = el.text
        if el.tag == 'NOME_FREGUESIA': nf = el.text
        if el.tag == 'NOME_LOCALIDADE': nl = el.text
        if el.tag == "LOCALIDADE":
            inner = {}
            inner['NOME_DISTRITO'] = nome
            inner['NOME_CONCELHO'] = nc
            inner['NOME_FREGUESIA'] = nf            
            for i in el:                               
                print (i.tag,i.text)
                print(data)
                inner[i.tag] = i.text
                if inner.has_key('CPE'):
                    data.append(inner)   
                                                
df = pd.DataFrame(data)
df.to_csv('/Users/DanielMelo/Documents/Endesa/Portugal/CPE.csv',columns=['CPE','NOME_CONCELHO','NOME_FREGUESIA',
                                     'NOME_LOCALIDADE','CODIGO_POSTAL'])

But this is the result when I run with normal execution:

CPE NOME_CONCELHO   NOME_FREGUESIA  NOME_LOCALIDADE CODIGO_POSTAL
PT2000022152424NT   ALFANDEGA DA FE AGROBOM AGROBOM 5350
PT2000022152424NT   ALFANDEGA DA FE AGROBOM AGROBOM 5350
PT2000022152424NT   ALFANDEGA DA FE AGROBOM AGROBOM 5350
PT2000022152424NT   ALFANDEGA DA FE AGROBOM AGROBOM 5350
PT2000022152424NT   ALFANDEGA DA FE AGROBOM AGROBOM 5350
PT2000022153119LM   ALFANDEGA DA FE ALFANDEGA DA FE ALFANDEGA DA FE 5350
PT2000022153119LM   ALFANDEGA DA FE ALFANDEGA DA FE ALFANDEGA DA FE 5350
PT2000022153119LM   ALFANDEGA DA FE ALFANDEGA DA FE ALFANDEGA DA FE 5350
PT2000022153119LM   ALFANDEGA DA FE ALFANDEGA DA FE ALFANDEGA DA FE 5350

I don't know if it could be a problem when I append the dict into my list, or some kind of conflict when it is trying to convert to csv (which I don't think is the case).

But as I said it works and I have the result that I want when I am debugging, so I can not see what is the problem.

Community
  • 1
  • 1
Juliana Rivera
  • 1,013
  • 2
  • 9
  • 15

1 Answers1

2

You are repeatedly adding the same dictionary to the list. Python containers store references, not copies, so any alteration you make to that dictionary is going to be visible through all those references.

Yes, printing that dictionary before you altered it in a next loop iteration won't show the change you make in the next iteration. You are not printing the dictionaries you added, after all, so you don't see those references reflect the change.

Add a copy of the dictionary instead:

if inner.has_key('CPE'):
    data.append(inner.copy())

You can easily reproduce your problem in an interactive session:

>>> data = []
>>> inner = {'foo': 'bar'}
>>> data.append(inner)
>>> data
[{'foo': 'bar'}]
>>> inner['foo'] = 'spam'
>>> inner
{'foo': 'spam'}
>>> data  # note that the data list *also* changed!
[{'foo': 'spam'}]
>>> data = []  # start anew
>>> inner = {'foo': 'bar'}
>>> data.append(inner.copy())  # add a (shallow) copy
>>> data
[{'foo': 'bar'}]
>>> inner['foo'] = 'spam'
>>> data
[{'foo': 'bar'}]
>>> data.append(inner.copy())
>>> data
[{'foo': 'bar'}, {'foo': 'spam'}]
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • aside: performance issue: a lot of `if` could be turned into `elif`, speed could be a lot better. – Jean-François Fabre Aug 03 '16 at 13:28
  • So, everytime I want to append a dictionary into a list I need to append a copy? not the same dictionary with different value, but a copy? Thanks for your help! It works :) – Juliana Rivera Aug 03 '16 at 18:32
  • 1
    @JulianaRivera: if you don't create a copy, all you are doing is adding another reference; all references show the same dictionary data so you'll get the same data in the CSV output, repeated. – Martijn Pieters Aug 03 '16 at 18:38