awk to print lines matching a pattern

Question

I have an mpd which looks like below. with file name as mpd

<BaseURL>01/</BaseURL>
   <SegmentList timescale="1000">
   <SegmentURL media="1.ts" mediaRange="0-6003779"/>
   <SegmentURL media="2.ts" mediaRange="0-7313387"/>
   <BaseURL>02/</BaseURL>
   <SegmentList timescale="1000">            
   <SegmentURL media="1.ts" mediaRange="0-6003779"/>
   <SegmentURL media="2.ts" mediaRange="0-7313387"/>
   <BaseURL>01/</BaseURL>
   <SegmentList timescale="1000">
   <SegmentURL media="3.ts" mediaRange="0-6003779"/>
   <SegmentURL media="4.ts" mediaRange="0-7313387"/>    
   <BaseURL>02/</BaseURL>
   <SegmentList timescale="1000">
   <SegmentList timescale="1000">
   <SegmentURL media="3.ts" mediaRange="0-6003779"/>
   <SegmentURL media="4.ts" mediaRange="0-7313387"/>

I wanted to save the <segment URL lines for each <BaseURL> into different files

My desired output is

<BaseURL>01/</BaseURL>
    <SegmentURL media="1.ts" mediaRange="0-6003779"/>
    <SegmentURL media="2.ts" mediaRange="0-7313387"/>
    <SegmentURL media="3.ts" mediaRange="0-6003779"/>
    <SegmentURL media="4.ts" mediaRange="0-7313387"/>

I have tried the following command which doesnt work as expected any help would be appreciated. Below command is only printing the last segment URL in the mpd file. I am confused on why awk is printing only the last entries of segment url

  awk '
# start writing to new segment file segment.01 etc
match($0, /<BaseURL>([0-9]+)\/<\/BaseURL>/, m) {
  base=m[1]
  close(segf)
  segf="segment." base
  print "write segments to " segf
  print >segf
}
/<SegmentURL / {print >segf}
END {close(segf)}
' mpd

You say you want to save them in different files, but you're saving everything into `segment.01`. Where is your code to write to different files? Also, you're only matching `` lines that have a sequence of `0` and `1` characters, it won't match `02`. — Barmar, Sep 08 '16 at 21:05
The regexp should be `([0-9]+)<\/BaseURL>` to match any number. — Barmar, Sep 08 '16 at 21:06
Are you asking how to use the number from the capture group in the regexp in the `segf` filename? See http://stackoverflow.com/questions/10913598/how-to-get-sub-expression-value-of-regexp-in-awk — Barmar, Sep 08 '16 at 21:10
I tried this but still see similar issue but now I am able to get two files segment.01 and segment.02 but not all the segment urls are being saved in those files awk ' # start writing to new segment file segment.01 etc match($0, /([0-9]+)\/<\/BaseURL>/, m) { base=m[1] close(segf) segf="segment." base print "write segments to " segf print >segf } /segf} END {close(segf)} ' mpd — Raj, Sep 08 '16 at 21:33
That's impossible to read, add it as an update to the question so you can format it readably. — Barmar, Sep 08 '16 at 21:37
It shouldn't be necessary, but try using `>>` instead of `>`. — Barmar, Sep 08 '16 at 21:39
Some call it [summoning the daemon](https://www.metafilter.com/86689/), others refer to it as [the Call for Cthulhu](https://blog.codinghorror.com/parsing-html-the-cthulhu-way/) and few just [turned mad and met the Pony](https://stackoverflow.com/a/1732454/8344060). In short, never parse XML or HTML with a regex! Did you try an XML parser such as `xmlstarlet`, `xmllint` or `xsltproc`? — kvantour, Nov 20 '19 at 13:19

score 0 · Answer 1 · answered Sep 08 '16 at 22:00

0

  awk '
# start writing to new segment file segment.01 etc
match($0, /<BaseURL>([0-9]+)\/<\/BaseURL>/, m) {
  base=m[1]
  close(segf)
  segf="segment." base
  print "write segments to " segf
  print >>segf
}
/<SegmentURL / {print >segf}
END {close(segf)}
' mpd

answered Sep 08 '16 at 22:00

Raj

71
1
6

`>>` is needed because you close the file whenever you get to a new ``. The first time you write to a file after opening it, `>` will empty the file. If you just leave out `close(segf)` I think it should also work. – Barmar Sep 08 '16 at 22:02
When I am trying to run the above script it says ' for reading (No such file or directory)e `mpd and I am in the same directory as the mpd file while running it also I have provided the required permissions to the script(chmod +x run.sh) but I see the below error ' for reading (No such file or directory)e `mpd – Raj Sep 08 '16 at 23:23
It looks like the error message has gotten jumbled up in your comment. – Barmar Sep 09 '16 at 16:10
Is the awk command in a script? I suspect you have CRLF instead of LF as your newlines, use `dos2unix` to fix the file. – Barmar Sep 09 '16 at 16:11

Atul23 · Answer 2 · 2019-11-26T09:53:27.200

Here is my answer

cat dfg | awk 'function writeFile(a) { print $0 >> "File_"a; } BEGIN{FS="[<,>,=]";a=0;}{ if($2 == "BaseURL") { a++;writeFile(a) } else if($2 == "SegmentURL media") { writeFile(a) }}'

Explaination :- Use multiple file separator to compare accurately and extract required lines and keep a counter whenever we get BaseURL. Increment the counter and pass it to user defined fucntion in awk (Each time BaseURL is encountered new file is opened for writing as counter gets changed) which we have made for writing file based output

Output:-

File1_4 File1_3 File1_2 File1_1

awk to print lines matching a pattern

2 Answers2