0

I have the following Linux command which I am using to extract data from one very large log file.

sed -n "/<trade>/,/<\/trade>/p" Large.log > output.xml

However, the output is generated in a single file output.xml. My intention is to create a new file every time the "/<trade>/,/<\/trade>/p" is matched. Every new file will be named after the <id> tag which is inside the <trade> </trade> tags.

Something likes this...

sed -n "/<trade>/,/<\/trade>/p" Large.log > "/<id>/,/<\/id>/p".xml

However, that, of course, does not work and I am not sure how to apply a regex as a naming rule.

P.S At this point, I am also not sure if I should use sed or maybe I should try achieving this with awk

Cyrus
  • 84,225
  • 14
  • 89
  • 153
Ne7WoRK
  • 161
  • 1
  • 1
  • 15
  • 6
    [Don't Parse XML/HTML With Regex.](https://stackoverflow.com/a/1732454/3776858) I suggest to use an XML/HTML parser (xmlstarlet, xmllint ...). – Cyrus Dec 16 '20 at 21:27
  • Thanks, for the suggestion @Cyrus, but I am not allowed to use third party libraries for this one. – Ne7WoRK Dec 16 '20 at 21:45
  • Suggestion 2: Take a look at XSLT/XPATH for using that also with XML/HTML parser - Or let the Endpoint do that instead server side. But OK - While i was wroting you place your 'not allowed' comment - @Cyrus Thanx alot for the link - LOL - Now i know where the pandemic desease come from ;-) – koyaanisqatsi Dec 16 '20 at 21:56
  • 1
    since output filename is determined by the contents of the file, I'd opt for a `awk` solution, making sure to `close()` a file once you're done writing to it (due to some `awk` implementations not being able to maintain many open file descriptors); how easy/hard this may be (parsing with `awk`) will depend on the formatting of the file; if you have problems with a `awk` solution I'd suggest starting a new question and make sure to provide a) sample input, the `awk` code you've written, c) the wrong output generated by your code, and d) the desired output – markp-fuso Dec 16 '20 at 22:24

0 Answers0