My xml-schema is as following:
root
|-- generalinfo: string (nullable = true)
|-- info: string (nullable = true)
|-- info-info: string (nullable = true)
|-- metadata: struct (nullable = true)
| |-- element: array (nullable = true)
| | |-- element: string (containsNull = true)
|-- files: array (nullable = true)
| |-- element: string (containsNull = true)
|-- parents: string (nullable = true)
|-- participants: array (nullable = true)
| |-- element: string (containsNull = true)
|-- signatures: array (nullable = true)
| |-- element: string (containsNull = true)
|-- size: string (nullable = true)
|-- system-path: string (nullable = true)
|-- satellite: string (nullable = true)
|-- event: array (nullable = true)
| |-- element: string (containsNull = true)
|-- user: string (nullable = true)
|-- version_id: string (nullable = true)
I've read the XML file with the following code:
df = sqlContext.read
.format("com.databricks.spark.xml")
.option("rowTag", "product")
.option("attributePrefix", "_")
.load("datafiles/product.xml")
My problem is that all tags are empty.