0

I have an XML file that has multiple nodes, but not every child node is in every node (Not 100% I am using that terminology correctly). I need to export the child nodes into a CSV as column headers. If the child node is missing from the first node, than that childnode is not included as a header and the data is not pulled.

Example:

<data>
    <row>
        <TEST_REF_ID>10001001</TEST_REF_ID>
        <TEST_FIRST_NAME>Samuel</TEST_FIRST_NAME>
        <TEST_LAST_NAME>Adams</TEST_LAST_NAME>
        <TEST_SEX>M</TEST_SEX>
        <TEST_BIRTHDATE>01/01/2001</TEST_BIRTHDATE>
    </row>
    <row>
        <TEST_REF_ID>20002002</TEST_REF_ID>
        <TEST_FIRST_NAME>John</TEST_FIRST_NAME>
        <TEST_MIDDLE_NAME>R</TEST_MIDDLE_NAME>
        <TEST_LAST_NAME>Smith</TEST_LAST_NAME>
        <TEST_SEX>M</TEST_SEX>
        <TEST_BIRTHDATE>01/01/1999</TEST_BIRTHDATE>
        <TEST_SSN>123456789</TEST_SSN>
    </row>
</data>

If I do this:

#Location of XML Data is in a URL
[xml]$export = Invoke-WebRequest -Uri $url -Credential $credential
$export.data.row | Export-Csv "C:\test.csv" -NoTypeInformation

Columns TEST_MIDDLE_NAME and TEST_SSN would not be included in the CSV.

If I do this:

$columns = 'TEST_REF_ID', 'TEST_FIRST_NAME', 'TEST_MIDDLE_NAME',
           'TEST_LAST_NAME', 'TEST_SEX', 'TEST_BIRTHDATE', 'TEST_SNN'
[xml]$export = Invoke-WebRequest -Uri $url -Credential $credential
$export.data.row |
    Select-Object $columns |
    Export-Csv "C:\test.csv" -NoTypeInformation

all the columns are included as long as I know what they are before hand. If the column name changes, then it will not be included and the column I have incorrect will just have no data.

If I do this:

#Location of XML Data is in a URL
[xml]$export = Invoke-WebRequest -Uri $url -Credential $credential
$export.SelectNodes("data/row/*") | Select-Object -Expand Name -Unique

It will give me all the column names, which I could use as the headers, but it does not give them to me in the correct order. It gives them back in the order they are retrieved, so in this case it would return: TEST_REF_ID, TEST_FIRST_NAME, TEST_LAST_NAME, TEST_SEX, TEST_BIRTHDATE, TEST_MIDDLE_NAME, TEST_SNN.

It brings back the first set in a row, then adds on the two that were missing front the first set on to the end of the list. I need them to be pulled back in the correct order as displayed in the nodes.

I hope that all makes sense, using the $Column method does do it exactly what I need, but it breaks if changes are made with out my knowledge, I am hoping to remedy that possibility if possible.

Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
moore1emu
  • 476
  • 8
  • 27
  • 1
    *"I need them to be pulled back in the correct order as displayed in the nodes."* Why? – Ansgar Wiechers Jan 31 '19 at 18:33
  • 3
    What is the "correct" order? Imagine there are 100 `` elements, and they all have slightly different columns. Which one will you pick to define the "correct" order? – Tomalak Jan 31 '19 at 18:34
  • 2
    Gather all the required elements first into an object before you export, if ordering is important then do that just before/as part of the export. – Scepticalist Jan 31 '19 at 18:42
  • 2
    If there is at least one node with a complete child node list you could pick the longest list. If that is not the case you need to merge all unique node lists into one. This is not a trivial task, even if we don't account for conflicting element order. – Ansgar Wiechers Jan 31 '19 at 18:45
  • @AnsgarWiechers The csv file is then uploaded to an oracl sql table via a control file. If the columns do not match, the data would be imported into the wrong column – moore1emu Jan 31 '19 at 19:06
  • @tomalak The correct order is hard to explain, but would be the one with the complete list of data. put another way the correct order would mean the placement matches all the existing data.rows, if that makes sense – moore1emu Jan 31 '19 at 19:10
  • @AnsgarWiechers I tried to create a unique node list, with that one script. but it does not place the items in the order that i need. how would i go about choosing the longest list? If its not something simple I am missing, or is just ridicules than i will just use the $Column method i am currently using. just didnt know if there was something i was missing that would give me what i wanted. – moore1emu Jan 31 '19 at 19:12
  • @Scepticalist how would i gather the elements into an object outside of how i am already doing it with $Column. I can retrieve a list of all the child node names, but not in the correct order. – moore1emu Jan 31 '19 at 19:13
  • @AnsgarWiechers thanks for the post edit, i did not know about the language display distinction – moore1emu Jan 31 '19 at 19:15
  • 1
    If you would define "correct" as *"the uniqe list of all columns that occur across all rows"*, then this would be doable. This gets harder if you put a lot of emphasis on *"in the order the order they appear in the document"*. If it's *"in any order as long as all columns are there and everything is consistent in the resulting CSV"* then this is a lot easier. – Tomalak Jan 31 '19 at 19:15
  • @Tomalak Your last comment about "in any order as long as all columns are there and everything is consistent in the resulting CSV" I might be able to make that work, how would I go about that. – moore1emu Jan 31 '19 at 19:18
  • 1
    You're already doing the first half of this with your `$cols = $export.SelectNodes("data/row/*") | Select-Object -Expand Name -Unique`. This gives you all columns that occur anywhere. The second half is just as simple `$export.data.row | Select $cols | ConvertTo-Csv -NoTypeInformation`. – Tomalak Jan 31 '19 at 19:25
  • @Tomalak I got just. unfortunately, i would need it to be in the same order everytime i did the transform. Doing this method will give me all the columns in a specific order, but the next time I run it it would be in a different order, which would be a problem. – moore1emu Jan 31 '19 at 19:41
  • 1
    I'm not familiar with Oracle DBs, but shouldn't [this](https://stackoverflow.com/a/37566622/1630171) import the data while mapping table columns to CSV columns? – Ansgar Wiechers Jan 31 '19 at 21:29
  • @AnsgarWiechers that is what i am using, but it requires the column names, like in the example. so if my CSV Column Names or order doesnt match what is in the Control file. Then it would error out. – moore1emu Jan 31 '19 at 22:49
  • AFAICS you know the order of columns after the export, and you can map the Oracle table columns to the CSV fields (`LOAD DATA INFILE input.csv INTO TABLE table ... (col2, col3, col1, ...)`). Where's the problem? – Ansgar Wiechers Jan 31 '19 at 23:02

0 Answers0