I have a script in python that does the following: For a folder of XML files (each file lacks a docroot) :
- Read the first 7 lines of source file, but do nothing with them as they need to "not be in the output"
- Write a new file (in separate directory) that starts with XML tag & opening Docroot / Parent Tag
- While still reading source file at line 8, go line-by-line and append same new file
- Append a closing Docroot / Parent tag to end of new file Inspiration from John Machin Feb 1 2011
I have a similar solution using bash and sed. The project sponsor is looking to have the script get called by AWS Lambda, and as such, is leaning towards python as the script's language.
I'm looking for a performance boost and scaling (the source files range in size from 2 MB to 241 MB, and may be larger in the future).
Is it better to stick to Pure Python solution, or use Python, but call out to the sed routines or run the bash script using the subprocess
module? Thanks.