3

I have converted an xml file into json using xml2json.

A small section of it looks as below. I want to convert this into csv. I am using csvkit in2csv

Using base syntax shows an error, simple.

C:\Users\Renshaw\Documents\Sayth\XML>in2csv test2.json > test2.csv
When converting a JSON document with a top-level dictionary element, a key must
be specified.

So adding the key I get no errors, but also no output.

C:\Users\Renshaw\Documents\Sayth\XML>in2csv test2.json -k "//Meeting/Races" > te
st2.csv
'//Meeting/Races'

C:\Users\Renshaw\Documents\Sayth\XML>in2csv test2.json -k "//Meeting/Races/RaceE
ntries/RaceEntry" > test2.csv
'//Meeting/Races/RaceEntries/RaceEntry'

I have tried a wide range of keys now and get no error but also no output, any idea on how to make it supply output in csv?

{
  "Meeting": {
    "NumOfRaces": {
      "#tail": "\n  ",
      "#text": "9"
    },
    "WeightsPublishing": {
      "#tail": "\n  ",
      "#text": "2014-09-30T00:00:00+10:00"
    },
    "NominationsClose": {
      "#tail": "\n  ",
      "#text": "2014-09-29T12:00:00+10:00"
    },
    "CodeType": {
      "#tail": "\n  ",
      "#text": "GALLOPS"
    },
    "Track": {
      "Rainfall": {
        "#tail": "\n    ",
        "#text": "Nil last 24hrs, 4.2mm last 7 days"
      },
      "Irrigation": {
        "#tail": "\n    ",
        "#text": "Nil last 24hrs, 25mm last 7 days"
      },
      "RailPosition": {
        "#tail": "\n    ",
        "#text": "+9m Entire Circuit"
      },
      "#tail": "\n  ",
      "TrackSurface": {
        "#tail": "\n    ",
        "#text": "Turf"
      },
      "Comments": {
        "#tail": "\n    ",
        "#text": "Finalised 4\/10 - 7:45am  Late Scratching Race 3 No. 4"
      },
      "Weather": {
        "#tail": "\n    ",
        "#text": "Fine"
      },
      "Penetrometer": {
        "#tail": "\n    ",
        "#text": "4.83"
      },
      "RailPositionLastMeeting": {
        "#tail": "\n    ",
        "#text": "True Position Entire Circuit"
      },
      "TrackInfo": {
        "#tail": "\n  ",
        "#text": "Penetrometer: Inside 4.85, Outside 4.85"
      },
      "TrackRating": {
        "#tail": "\n    ",
        "#text": "Good"
      },
      "#text": "\n    ",
      "RacingDirection": {
        "#tail": "\n    ",
        "#text": "AntiClockwise"
      }
    },
    "MeetingStage": {
      "#tail": "\n  ",
      "#text": "Acceptances"
    },
    "Races": {
      "#tail": "\n",
      "#text": "\n    ",
      "Race": [
        {
          "Comments": {
            "#tail": "\n    "
          },
          "NominationsDivisor": {
            "#tail": "\n      ",
            "#text": "0"
          },
          "Starters": {
            "#tail": "\n      ",
            "#text": "11"
          },
          "TrackRecords": {
            "#tail": "\n      ",
            "TrackRecord": {
              "TrackRecordHorse": {
                "#tail": "\n        "
              },
              "#text": "\n          ",
              "#tail": "\n      ",
              "DistanceRace": {
                "#tail": "\n          ",
                "#text": "1000"
              },
              "Time": {
                "#tail": "\n          ",
                "#text": "00:00:55.420"
              },
              "RaceNumber": {
                "#tail": "\n          ",
                "#text": "7"
              },
              "RaceDate": {
                "#tail": "\n          ",
                "#text": "2013-02-16"
              }
            },
            "#text": "\n        "
          },
          "RaceDistance": {
            "#tail": "\n      ",
            "#text": "1000"
          },
          "NominationsRaceNumber": {
            "#tail": "\n      ",
            "#text": "1"
          },
          "ApprenticeCanClaim": {
            "#tail": "\n      ",
            "#text": "false"
          },
          "SizeField": {
            "#tail": "\n      ",
            "#text": "16"
          },
          "NameRaceForm": {
            "#tail": "\n      ",
            "#text": "MARIBYRNONG TRL"
          },
          "RaceType": {
            "#tail": "\n      ",
            "#text": "Flat"
          },
          "SizeEmergency": {
            "#tail": "\n      ",
            "#text": "4"
          },
          "DistanceApprox": {
            "#tail": "\n      ",
            "#text": "false"
          },
          "#text": "\n      ",
          "BallotedOutEntries": {
            "#tail": "\n      "
          },
          "Logos": {
            "#tail": "\n      ",
            "Logo": {
              "#tail": "\n      "
            },
            "#text": "\n        "
          },
          "#tail": "\n    ",
          "TrackCircumference": {
            "#tail": "\n      ",
            "#text": "2313"
          },
          "NameRaceNews": {
            "#tail": "\n      ",
            "#text": "Maribyrnong Trial Stakes"
          },
          "WeightChange": {
            "#tail": "\n      ",
            "#text": "0.00"
          },
          "Accepters": {
            "#tail": "\n      ",
            "#text": "12"
          },
          "RaceEntries": {
            "RaceEntry": [
              {
                "Trainer": {
                  "Location": {
                    "#tail": "\n            ",
                    "#text": "Cranbourne"
                  },
                  "#text": "\n            ",
                  "Surname": {
                    "#tail": "\n            ",
                    "#text": "Laing"
sayth
  • 6,696
  • 12
  • 58
  • 100

2 Answers2

3

There are two problems with what you are doing.

Firstly, you are specifying the key incorrectly (you are doing it XML/XPath-style, with slashes, when you are dealing with JSON in this case). You should simply provide the name of the element (e.g. Meeting).

However, the main problem is the type of JSON you are using, which consists of multiple nested dictionaries, which in2csv can't really deal with (with several levels, how would it know which columns to use?). You need to somehow flatten your data so that fields can be identified clearly.

You could look into this question for ideas into how to convert JSON into CSV, because I don't think in2csv is going to cut it in your case.

Community
  • 1
  • 1
Dologan
  • 4,554
  • 2
  • 31
  • 33
1

If what you are looking for is to convert each XML path into a path expression, use that for column 1 in your CSV, and use the value at the lowermost level for column 2, the following code might solve your problem:

import json

json_input = """{
  "Meeting": {
    "NominationsClose": {
      "#tail": "\\n  ",
      "#text": "2014-09-29T12:00:00+10:00"
    },
    "CodeType": {
      "#tail": "\\n  ",
      "#text": "GALLOPS"
    },
    "Track": {
      "Rainfall": {
        "#tail": "\\n    ",
        "#text": "Nil last 24hrs, 4.2mm last 7 days"
      },
      "Irrigation": {
        "#tail": "\\n    ",
        "#text": "Nil last 24hrs, 25mm last 7 days"
      }
    }
  }
}"""

def print_csv_depth_first(tree, path=""):
    if isinstance(tree, dict):
        for key in tree.keys():
            print_csv_depth_first(tree[key], "{}/{}".format(path, key))
    elif isinstance(tree, list):
        for i in range(len(tree)):
            print_csv_depth_first(tree[i], "{}/{}".format(path, str(i)))
    elif isinstance(tree, str):
        entry = tree
        print('{},{}'.format(path, repr(entry)))
        return

json = json.loads(json_input)
print_csv_depth_first(json)

I have included a small part of your example JSON data. At its very bottom your data also contains the beginning of a list, "RaceEntry": [, but that is incomplete, so I had to extrapolate. The above code produces the following output:

/Meeting/NominationsClose/#tail,'\n  '
/Meeting/NominationsClose/#text,'2014-09-29T12:00:00+10:00'
/Meeting/CodeType/#tail,'\n  '
/Meeting/CodeType/#text,'GALLOPS'
/Meeting/Track/Rainfall/#tail,'\n    '
/Meeting/Track/Rainfall/#text,'Nil last 24hrs, 4.2mm last 7 days'
/Meeting/Track/Irrigation/#tail,'\n    '
/Meeting/Track/Irrigation/#text,'Nil last 24hrs, 25mm last 7 days'

You will have to adapt the line containing the print statement to suit your needs.

Lutz Prechelt
  • 36,608
  • 11
  • 63
  • 88