In most cases where we're overwriting an XML schema in Azure Dataflows we're having no problems. In a recent file we recieved, we started having trouble where if we make any changes to the schema, e.g. simply changing a node to an array, downstream transformations stop working.
Very basic flow as we would have with any other XML file:
The Derived column step that normally works.
Note: The type mismatch is a bit misleading because when you open the expression builder you get a clearer idea of what's going on.
Note: Here it's saying that the field "payload.PIDXPlannedMovement.{d1p1.Header}" doesn't exist. As you can see though, I picked it out of the Input Schema.
Here's a snippet of the XML we're working with:
<?xml version='1.0' encoding='UTF-8' ?>
<payload>
<PIDXPlannedMovement xmlns="http://www.pidx.org/schema/ds/v5.02" xmlns:d1p1="http://www.pidx.org/schema/ds/v5.02" d1p1:MajorVersion="05" d1p1:MinorVersion="02" d1p1:FixVersion="02">
<d1p1:Header>
<d1p1:DocumentIdentifier>1234</d1p1:DocumentIdentifier>
<d1p1:DateTime>2023-01-13T05:21:32</d1p1:DateTime>
<d1p1:From/>
<d1p1:To/>
</d1p1:Header>
<d1p1:Documents>
<d1p1:Document>
... *removed for brevity*
</d1p1:Document>
</d1p1:Documents>
</PIDXPlannedMovement>
</payload>
Note: In this instance Dataflow didn't pick up that the Document node should be an array, so we overwrite the schema on the projection tab with the following change:
(
{@d1p1:FixVersion} as short,
{@d1p1:MajorVersion} as short,
{@d1p1:MinorVersion} as short,
{@xmlns} as string,
{@xmlns:d1p1} as string,
PIDXPlannedMovement as (
{d1p1:Documents} as (
{d1p1:Document} as (
... *removed for brevity*
)[]
),
{d1p1:Header} as (
{d1p1:DateTime} as string,
{d1p1:DocumentIdentifier} as short,
{d1p1:From} as string,
{d1p1:To} as string
)
)
)
Note: We're simply adding the "[]" to the "{d1p1:Document}" part of the expression.
Inspect tab before overwriting the schema:
Inspect tab after overwriting the schema:
Note: Correctly showing the node changed to an array.
In all of our other dataflows using XML we don't have the xml namespace prefixes so I'm leaning towards that as being the cause, but I haven't been able to find out if that is in fact the problem and how to fix it.
Any help would be appreciated. I'm sure someone has faced this.