Suppose I have an original lxml tree as following:
my_data.xml
<?xml version="1.0" encoding="UTF-8"?>
<data>
<country name="Liechtenstein" xmlns="aaa:bbb:ccc:liechtenstein:eee">
<rank updated="yes">2</rank>
<holidays>
<christmas>Yes</christmas>
</holidays>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore" xmlns="aaa:bbb:ccc:singapore:eee">
<continent>Asia</continent>
<holidays>
<christmas>Yes</christmas>
</holidays>
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama" xmlns="aaa:bbb:ccc:panama:eee">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
<ethnicity xmlns="aaa:bbb:ccc:ethnicity:eee">
<malay>
<holidays>
<ramadan>Yes</ramadan>
</holidays>
</malay>
</ethnicity>
</data>
Parsing:
xt = etree.parse("my_data.xml")
xr = xt.getroot()
Now I want to create a list of duplicated trees. In this example, I create a list of 3 duplicated trees:
f_list = [1, 2, 3]
xtrees = [copy.deepcopy(xt)] * len(f_list)
xroots = [xtree.getroot() for xtree in xtrees]
ramadan_nodes = [xtree.find('.//{*}ramadan') for xtree in xtrees]
Along with those 3 trees, I have a list of ramadan
nodes, each of which belongs to individual tree.
Now I want to duplicate the ramadan
node in each of those new 3 trees, and append it to them, individually.
for i in range(3):
new_ramadan_node = copy.deepcopy(ramadan_nodes[i])
ramadan_parent = ramadan_nodes[i].getparent()
position = ramadan_parent.index(ramadan_nodes[i]) + 1
ramadan_parent.insert(position, new_ramadan_node)
As above, I intend to have only ONE duplicated ramadan
node in each tree. However, upon running that code, each of the 3 duplicated trees contains FOUR ramadan
nodes (1 being the original and 3 being added by the for
loop above).
Why is this happening? Also, I notice that if I want to print the list ramadan
nodes with:
print(ramadan_nodes)
I get these numbers Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>
repeated exactly 3 times as following:
[<Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>,
<Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>,
<Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>]
What is this number 0x203b4f849c0
? I suspect it has something to do with the multiple duplication here. If someone could help explaining. Thanks.
Below is the full continuous code:
import copy
import lxml.etree as etree
file_path = "my_data.xml"
xt = etree.parse(file_path)
xr = xt.getroot()
f_list = [1, 2, 3]
xtrees = [copy.deepcopy(xt)] * len(f_list)
xroots = [xtree.getroot() for xtree in xtrees]
ramadan_nodes = [xtree.find('.//{*}ramadan') for xtree in xtrees]
for i in range(3):
new_ramadan_node = copy.deepcopy(ramadan_nodes[i])
ramadan_parent = ramadan_nodes[i].getparent()
position = ramadan_parent.index(ramadan_nodes[i]) + 1
ramadan_parent.insert(position, new_ramadan_node)
print(ramadan_nodes)
etree.dump(xroots[0])
etree.dump(xroots[1])
etree.dump(xroots[2])
Update:
If I replace these two lines :
xtrees = [copy.deepcopy(xt)] * len(f_list)
xroots = [xtree.getroot() for xtree in xtrees]
with
xtrees = []
xroots = []
xtrees.append(copy.deepcopy(xt))
xroots.append(xtrees[-1].getroot())
xtrees.append(copy.deepcopy(xt))
xroots.append(xtrees[-1].getroot())
xtrees.append(copy.deepcopy(xt))
xroots.append(xtrees[-1].getroot())
I get the expected output. It seems copy.deepcopy
does not produce distinct objects when used in a list
? Why is it so?