2

Is there a way to go directly from XML to Avro in Python?

From the documentation, it seems that there isn't a direct path... so far the workflow looks like this to me:

  1. Create schema in json
  2. Read in each line of XML
  3. Parse XML and assign to corresponding json binding
  4. Read in the json formatted XML document using the Python avro reader & JSON formatted schema
  5. Close avro file

Is there a better (more direct) way?

anonygrits
  • 1,459
  • 2
  • 14
  • 19

1 Answers1

0

If you can parse your XML into regular python objects, you can skip the json and schema creation steps if you use rec-avro package.

It allows you to take any python data structure, including parsed XML or JSON and store it in Avro without a need for a dedicated schema.

I tested it for python 3.

You can install it as pip3 install rec-avro or see the code and docs at https://github.com/bmizhen/rec-avro

I gave a json to avro example here: https://stackoverflow.com/a/55444481/6654219 But the answer applies to your case as well, you just need to change json_objects() function to return your parsed XML.

boriska
  • 171
  • 1
  • 8