0

I have a script in python that does the following: For a folder of XML files (each file lacks a docroot) :

  • Read the first 7 lines of source file, but do nothing with them as they need to "not be in the output"
  • Write a new file (in separate directory) that starts with XML tag & opening Docroot / Parent Tag
  • While still reading source file at line 8, go line-by-line and append same new file
  • Append a closing Docroot / Parent tag to end of new file Inspiration from John Machin Feb 1 2011

I have a similar solution using bash and sed. The project sponsor is looking to have the script get called by AWS Lambda, and as such, is leaning towards python as the script's language.

I'm looking for a performance boost and scaling (the source files range in size from 2 MB to 241 MB, and may be larger in the future). Is it better to stick to Pure Python solution, or use Python, but call out to the sed routines or run the bash script using the subprocess module? Thanks.

Adam T
  • 675
  • 8
  • 22
  • 1
    pure python will be easier to integrate and debug but raw text search & replace may be slower than a compiled `sed`. I wouldn't extend that to `bash`, though. – Jean-François Fabre Sep 06 '17 at 13:49
  • Thanks @Jean-FrançoisFabre but I'm not sure I follow what you mean by `compiled sed` and `extend to bash`? Are you suggesting that I use an object of subprocess to call out to sed command(s), and not use subprocess to invoke a bash script? – Adam T Sep 06 '17 at 14:01
  • 1
    I recommend that you use sed for maximum performance, but don't write your commands using bash if you can avoid it, rather chain fast, compiled commands like sed into a python native script. You don't need bash. – Jean-François Fabre Sep 06 '17 at 14:02
  • Ah, ok I think I get what you mean.Just to be sure, you're suggesting something along the lines of `subprocess.call(["sed -i -e 's/hello/helloworld/g' www.txt"], shell=True)` , correct? – Adam T Sep 06 '17 at 14:08
  • yup sth like that – Jean-François Fabre Sep 06 '17 at 14:08

1 Answers1

0

Per the advice from Jean-François Fabre , I used subprocess.call() and it worked great. Thanks again for your help

Adam T
  • 675
  • 8
  • 22