1

Suppose I have an original lxml tree as following:

my_data.xml

<?xml version="1.0" encoding="UTF-8"?>
<data>
  <country name="Liechtenstein" xmlns="aaa:bbb:ccc:liechtenstein:eee">
    <rank updated="yes">2</rank>
    <holidays>
      <christmas>Yes</christmas>
    </holidays>
    <year>2008</year>
    <gdppc>141100</gdppc>
    <neighbor name="Austria" direction="E"/>
    <neighbor name="Switzerland" direction="W"/>
  </country>
  <country name="Singapore" xmlns="aaa:bbb:ccc:singapore:eee">
    <continent>Asia</continent>
    <holidays>
      <christmas>Yes</christmas>
    </holidays>
    <rank updated="yes">5</rank>
    <year>2011</year>
    <gdppc>59900</gdppc>
    <neighbor name="Malaysia" direction="N"/>
  </country>
  <country name="Panama" xmlns="aaa:bbb:ccc:panama:eee">
    <rank updated="yes">69</rank>
    <year>2011</year>
    <gdppc>13600</gdppc>
    <neighbor name="Costa Rica" direction="W"/>
    <neighbor name="Colombia" direction="E"/>
  </country>
  <ethnicity xmlns="aaa:bbb:ccc:ethnicity:eee">
    <malay>
      <holidays>
        <ramadan>Yes</ramadan>
      </holidays>
    </malay>
  </ethnicity>
</data>

Parsing:

xt = etree.parse("my_data.xml")
xr = xt.getroot()

Now I want to create a list of duplicated trees. In this example, I create a list of 3 duplicated trees:

f_list = [1, 2, 3]

xtrees = [copy.deepcopy(xt)] * len(f_list)
xroots = [xtree.getroot() for xtree in xtrees]
ramadan_nodes = [xtree.find('.//{*}ramadan') for xtree in xtrees]

Along with those 3 trees, I have a list of ramadan nodes, each of which belongs to individual tree. Now I want to duplicate the ramadan node in each of those new 3 trees, and append it to them, individually.

for i in range(3):
    new_ramadan_node = copy.deepcopy(ramadan_nodes[i])
    ramadan_parent = ramadan_nodes[i].getparent()
    position = ramadan_parent.index(ramadan_nodes[i]) + 1
    ramadan_parent.insert(position, new_ramadan_node)

As above, I intend to have only ONE duplicated ramadan node in each tree. However, upon running that code, each of the 3 duplicated trees contains FOUR ramadan nodes (1 being the original and 3 being added by the for loop above).

Why is this happening? Also, I notice that if I want to print the list ramadan nodes with:

print(ramadan_nodes)

I get these numbers Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0> repeated exactly 3 times as following:

[<Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>, 
<Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>, 
<Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>]

What is this number 0x203b4f849c0? I suspect it has something to do with the multiple duplication here. If someone could help explaining. Thanks.

Below is the full continuous code:

import copy
import lxml.etree as etree

file_path = "my_data.xml"
xt = etree.parse(file_path)
xr = xt.getroot()

f_list = [1, 2, 3]

xtrees = [copy.deepcopy(xt)] * len(f_list)
xroots = [xtree.getroot() for xtree in xtrees]
ramadan_nodes = [xtree.find('.//{*}ramadan') for xtree in xtrees]

for i in range(3):
    new_ramadan_node = copy.deepcopy(ramadan_nodes[i])
    ramadan_parent = ramadan_nodes[i].getparent()
    position = ramadan_parent.index(ramadan_nodes[i]) + 1
    ramadan_parent.insert(position, new_ramadan_node)

print(ramadan_nodes)
etree.dump(xroots[0])
etree.dump(xroots[1])
etree.dump(xroots[2])

Update:

If I replace these two lines :

xtrees = [copy.deepcopy(xt)] * len(f_list)
xroots = [xtree.getroot() for xtree in xtrees]

with

xtrees = []
xroots = []
xtrees.append(copy.deepcopy(xt))
xroots.append(xtrees[-1].getroot())
xtrees.append(copy.deepcopy(xt))
xroots.append(xtrees[-1].getroot())
xtrees.append(copy.deepcopy(xt))
xroots.append(xtrees[-1].getroot())

I get the expected output. It seems copy.deepcopy does not produce distinct objects when used in a list? Why is it so?

Tristan Tran
  • 1,351
  • 1
  • 10
  • 36
  • 1
    The number is the "identity" of the element, as returned by `id()`: https://docs.python.org/3/library/functions.html?highlight=id#id. – mzjn Jun 01 '21 at 19:28
  • It is easier to help if you provide a single piece of complete code (including `import` statements) that we can copy, paste and run. Here you have posted fragments that have to be patched together. – mzjn Jun 01 '21 at 19:37
  • @mzjn Thanks. I updated with the full code. – Tristan Tran Jun 01 '21 at 19:41

1 Answers1

1

You are using the * operator:

xtrees = [copy.deepcopy(xt)] * len(f_list)

This does not create copies; it creates references to the original xt object.

To get actual copies, you can do as follows:

xtrees = [copy.deepcopy(xt) for _ in range(len(f_list))]

Related information:

mzjn
  • 48,958
  • 13
  • 128
  • 248