0

I have a large text file that contains data in the following format:

AUTHOR: John_Doe
TITLE: This is a title
BASENAME: this_is_a_title

AUTHOR: Jill_Doe
TITLE: Another Title
BASENAME: another_title

AUTHOR: Jack
TITLE: Last Title
BASENAME: last_title

How do I find all of the underscores in the document but only on lines that begin with 'BASENAME:'? I've tried lookbehinds and groupings but my limited regex knowledge just has me spinning in circles. Any thoughts? Thanks!

Edit: Sorry all, was away from my desk last night. Not sure what flavour of regex I'm using, it's for an advanced search in Sublime Text 2. And to clarify, I'm trying to find the underscores on BASENAME lines so that I can change them to dashes. So I'm looking for a RegEx that will return the underscores only.

John Ryan
  • 197
  • 2
  • 9
  • 1
    Which language (i.e. what flavor of regex)? Is `/^BASENAME:.*_/` close to what you want? – Beta Oct 02 '12 at 21:01
  • Not sure the language, it's for a search and replace in Sublime Text 2. Trying to replace the underscores with dashes, so I need the underscores only. – John Ryan Oct 03 '12 at 13:29

2 Answers2

0

I'd suggest ^BASENAME:[^_]*_.*$. This will find all such lines. Or do you want to find all underscores individually instead of the containing lines?

usta
  • 6,699
  • 3
  • 22
  • 39
  • Be sure and specify multiline mode so the `^` can match the beginning of the line. – Alan Moore Oct 02 '12 at 21:22
  • Hey Usta, just looking for the underscores themselves. – John Ryan Oct 03 '12 at 13:27
  • @JohnRyan Even in that case, and if I *had* to use regex, I would still find the lines first and then go for individual underscores within the selected lines, to keep things simple. – usta Oct 03 '12 at 14:12
  • Thanks @Usta, ended up using your advice. Found the lines that start with BASENAME and then searched within the selection. – John Ryan Oct 03 '12 at 15:24
0

I'd use grouping to get the underscores, otherwise you'll get all the text

(?<=^BASENAME:)(.+?(?<underscore>_))*

You'll find the underscores in the group named underscore.

If your language doesn't support named groups you can instead use

(?<=^BASENAME:)(.+?(_))*
Gabber
  • 5,152
  • 6
  • 35
  • 49
  • 1
    That lookahead is pointless. You have to consume all those characters either way, so why not do it the easy way like @usta did? Also, it's not a good idea to use named groups and numbered groups in the same regex; some flavors will even reject it as an error. And how do you know named groups will work, anway? The OP still hasn't said what flavor he's using. – Alan Moore Oct 03 '12 at 00:43
  • You are right, it was going to be a lookbehind to avoid the word "BASENAME" itself. Thanks for the named groups suggestion – Gabber Oct 03 '12 at 05:38