3

Trying to put a regex expression together that returns the string between _ and _$ (where $ is the end of the string).

input:

abc_def_ghi_

desired regex outcoume:

def_ghi

I've tried quite a few combinations such as thsi.

((([^_]*){1})[^_]*)_$

any help appreciated.

Note: the regex above returns abc_def, and not the desired def_ghi.

sgtz
  • 8,849
  • 9
  • 51
  • 91

3 Answers3

4

So it's everything between the first _ and the final _ (both excluding)?

Then try

(?<=_).*(?=_$)

(hoping you're not using JavaScript)

Explanation:

(?<=_)  # Assert that the previous character is a _
.*      # Match any number of characters...
(?=_$)  # ... until right before the final, string-ending _
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
1

You could try to use the greedyness of operators to your advantage:

^.*?_(.*)_$

matches everything from the start (non-greedy), up to an underscore, and from this underscore on to the end of the string, where it expects and underscore, then the end of the string, and captures it in the first match.

^    Beginning of string
.*?  Any number of characters, at least 0
_    Anchor-tag, literal underscore
(.*) Any number of characters, greedy
_    Anchor-tag, literal underscore
$    End of string
0xCAFEBABE
  • 5,576
  • 5
  • 34
  • 59
  • I'm not sure why you are using .split(). Using a regular match with the mentioned RegEx should return one captured group, the one you were looking for. – 0xCAFEBABE Jun 21 '12 at 08:18
0

I was searching for this within a larger log entry:

"threat_name":"PUP.Optional.Wajam"

The format enclosed the field name in double quotes then a colon then the value in double quotes.

Here's what I ended up with to avoid punctuation breaking the regex..

threat_name["][:]["](?P<signature>.*?)["]

(from regex101.com)

  • threat_name matches the characters threat_name literally (case sensitive)
  • ["] match a single character present in the list below
  • " a single character in the list " literally (case sensitive)
  • [:] match a single character present in the list below
  • : the literal character :
  • ["] match a single character present in the list below
  • " a single character in the list " literally (case sensitive)
  • (?P<signature>.*?) Named capturing group signature
  • .*? matches any character (except newline)
  • Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
  • ["] match a single character present in the list below
  • " a single character in the list " literally (case sensitive)
Vince Bowdren
  • 8,326
  • 3
  • 31
  • 56