0

What's a regex that will match lines whose previous line starts with a set of characters?

I'm trying to parse M3U files, and I need to match the lines whose preceding line starts with #EXTINF: So if we take this example:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:10
#EXTINF:11.54
ASMIK_tid_0000250058_m.600000-00000.ts
#EXTINF:8.51
ASMIK_tid_0000250058_m.600000-00001.ts
#EXTINF:11.76
ASMIK_tid_0000250058_m.600000-00002.ts
#EXTINF:10.05
ASMIK_tid_0000250058_m.600000-00003.ts
etc...

I only want to extract these lines:

ASMIK_tid_0000250058_m.600000-00000.ts
ASMIK_tid_0000250058_m.600000-00001.ts
ASMIK_tid_0000250058_m.600000-00002.ts
ASMIK_tid_0000250058_m.600000-00003.ts

I've tried variations on this answer and this: (?#EXT.*\n) but had no luck...

Community
  • 1
  • 1
Eric
  • 16,003
  • 15
  • 87
  • 139
  • Could you post the most successful variation that you have tried? Also, could it be that those lines you're looking for are those lines that don't start with `#`? – Jerry Dec 17 '13 at 11:57
  • I addded my best attempt at the regex. – Eric Dec 17 '13 at 12:33
  • Is that really it? It's far different from the answer you linked in your question... You might want to try something like: `#EXT[^\r\n]*[\r\n]+([^#][^\r\n]+)` The lines you're looking for are in the first capture group. – Jerry Dec 17 '13 at 13:22
  • Thanks Jerry but I can't get your regex to match just the lines I want. [See here](http://regexr.com?37m9f). If you can help me out, please post an answer. – Eric Dec 17 '13 at 15:30
  • 1
    Do you know whether objective c's regex supports variable width lookbehinds? I don't know how to write code in objective c, otherwise, I'd try out with [ideone](http://ideone.com). Otherwise, are you using replace? If so, you could perhaps use something like [that](http://regexr.com?37m9i)? – Jerry Dec 17 '13 at 15:35
  • [Looks like it does suuport lookbehinds](http://stackoverflow.com/questions/4250114/regex-issue-using-icu-regex-to-find-numbers-not-inside-parentheses), but I haven't figured that out yet. Your regex works in the test harness but I need to figure out how to translate it to Obj-C. – Eric Dec 17 '13 at 16:10
  • Okay, it seems like Objective C supports variable width lookbehind. Could you perhaps try `(?<=#EXT[^\r\n]*[\r\n]+)[^#][^\r\n]+` with match? If that works, I'll delete this comment and put it as answer. – Jerry Dec 17 '13 at 16:31
  • Thanks @Jerry, I tried that regex, but got errors. [I looked for help](http://stackoverflow.com/questions/20640375/cocoa-error-2048-when-using-nsregularexpression-in-cocoa) but haven't been able to figure it out. – Eric Dec 18 '13 at 09:24
  • 1
    I'm stumped. Do you think you could put something on ideone, the site I linked before and put the link to the code sample? It would be easier to debug with the interactive code and I won't have to wait for you to check if something works or not. Hopefully, the site has the necessary libraries to run regex. – Jerry Dec 18 '13 at 09:35
  • 1
    ideone doesn't have the Cocoa APIs, and `NSRegularExpression` is part of it. I tried using [this](http://www.compileonline.com/compile_objective-c_online.php) but I had the same problem... At any rate, your regex does work, so at this stage it's an obj-c problem which I'll figure out elsewhere. Thanks! – Eric Dec 18 '13 at 10:09
  • Ah okay. Good luck with that then :) – Jerry Dec 18 '13 at 11:21

1 Answers1

0

Firstly you have to be sure that the function you are using is matching the whole file instead of line by line, otherwise this is impossible.

Then you would need to specify a lookbehind:

(?<=#EXTINF.*\r\n).*

If your regex implementation does not support lookbehinds OR repetition inside of a lookbehind, you can use two capture groups instead:

(#EXTINF.*\r\n)(.*)

Obviously you would simply ignore the first capture group, but keep all of the data in the second capture group.

If you need to manually specify that the . does not match newlines, you can specify the mode at the beginning of the regex: (?-s)

Vasili Syrakis
  • 9,321
  • 1
  • 39
  • 56