Automate XML parsing and converting docx to pdf

Question

I have not been programming for many years but need to get this following process automated.

A government medicine authority publishes an xml file on their website. I need to download it and parse it and catch one of the fields that has a url to a docx file. I need to then store it on our local filesystem as a pdf. Need to repeat this process every n days.

I used to know PHP quite well but what would that be ok for this task. Would python be better. As I don't have a server at work so was thinking of getting a Raspberry Pi.

What would you suggest on how I would get about this.

I have a few ideas of using wget or curl through a cron job to get the xml file. Then use perhaps php or python or bash to parse the xml file, call the docx with wget or curl nad then use a pdf command line tool. If it would be on a website should I load the results in a sql db or just list them as files in a directory.

Would appreciate any ideas.

Martin

It should be relatively simple to do in Python. – AMC Jan 16 '20 at 22:51 — AMC, Jan 16 '20 at 22:51

score 0 · Answer 1 · answered Jan 16 '20 at 18:51

I, personally, would go with node.js. It is easy to setup a node server on a raspberry pi and node.js has a library for just about anything. There is a lot of simple setup tutorials out there and SO has a lot of info like xml parsing in node. JavaScript is pretty easy to code in.

For example if you need a docx converter, here is one: mammoth.js

Good Luck!

Automate XML parsing and converting docx to pdf

1 Answers1