I am trying to read an xml file in Azure Databricks Notebook in PySpark. The problem is that my persons.xml has some comments in the beginning. I just want to ignore them while reading the file.
df = spark.read
.format("com.databricks.spark.xml")
.option("rowTag", "person")
.xml("src/main/resources/persons.xml")
My XML looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<!--
<top>
<t1 attr1="a1">
<!-- t1 comment -->
<t2>Something 1</t2>
</t1>
<!-- between rows comment -->
<t1 attr1="a2">
<t2>Something 2</t2>
</t1>
</top>
-->
<naman>
<t1 attr1="a1">
<t2>Something 1</t2>
</t1>
<t1 attr1="a2">
<t2>Something 2</t2>
</t1>
</naman>