0

Firstly I followed this question, but I still have issues with the remove method.

tag.getparent().remove(tag)

I used this piece of code for removing anchor tag in question with attributes name="2" and id = "2" in this webpage

and when the line is executed I was still able to see the tag and its properties and when I iterate through all children I was still able to see the element which i deleted

What exactly does remove method does and why the tag which is deleted still persists?

This is the screenshot of the debugger after the line is executed.

enter image description here

Satyaaditya
  • 537
  • 8
  • 26

1 Answers1

2

When you remove a node from its parent the node itself still exists, but is simply detached from the parent. This allows you to append the "deleted" node to a different parent. But if you don't append the node to a new parent, then the node is a good as deleted from the perspective of the root node.

To preserve the children of the tag node being removed, you can prune them to the tag's parent at the same index like this:

parent = tag.getparent()
index = parent.index(tag)
for child in tag.getchildren()[::-1]: # in reverse order so that we can keep inserting at the same index while preserving the original order
    tag.remove(child)
    parent.insert(index, child)
parent.remove(tag)

Or you can simply use the drop_tag method:

tag.drop_tag()
blhsing
  • 91,368
  • 6
  • 71
  • 106
  • I need to remove it completely, from the dom. such that while iterating through dom I shouldn't get that node. How can I achieve that? – Satyaaditya Oct 17 '18 at 06:37
  • If you traverse the DOM structure from the root node then the deleted node should not be included anymore. – blhsing Oct 17 '18 at 06:38
  • it is not getting included but got an issue with this. Before deletion there is tag.text_content() has actual text present in the tag, but after deletion, it is not showing any text in it. – Satyaaditya Oct 17 '18 at 06:46
  • 1
    When you delete a node, the entire tree under the deleted node goes along with it as well. Since the text node is a child node of the element node you deleted so it is also gone. If you want the child nodes of the deleted node back you need to iterate the child nodes through the deleted node, remove the child nodes from the deleted nodes, and then insert the child nodes to the deleted node's parent node at where the deleted node was. – blhsing Oct 17 '18 at 06:50
  • but the text is not inside the deleted node, it is actually sibling to the deleted node. please go through webpage i attached in question for reference, and any node with name and id attributes – Satyaaditya Oct 17 '18 at 06:52
  • The text node is definitely a child node of the tag node. The very fact that you can use `tag.text_content()` to obtain the text means the text is a child of the tag. – blhsing Oct 17 '18 at 06:55
  • then it should show in tag.getchildren() ryt, please see the screenshot I attached, there is an empty list for tag.getchildren() for the tag – Satyaaditya Oct 17 '18 at 06:58
  • 1
    You can try pruning the node's children to the node's parent before the node is deleted then, since it appears that after the removal the associations with children are also gone. – blhsing Oct 17 '18 at 07:03
  • will tag.append(child) helps for your suggestion? – Satyaaditya Oct 17 '18 at 07:06
  • 1
    I've updated my answer with an example of how you can do this. – blhsing Oct 17 '18 at 07:11
  • what exactly it does, will that help in case of multiple children? – Satyaaditya Oct 17 '18 at 07:13
  • It does exactly what I described above (7 messages ago). It iterates through all the children of the node to insert them into the node's parent at where the node is as you can see from the code. – blhsing Oct 17 '18 at 07:15