14

This is probably a simple problem, but unfortunately I wasn't able to get the results I wanted...

Say, I have the following line:

"Wouldn't It Be Nice" (B. Wilson/Asher/Love)

I would have to look for this pattern:

" (<any string>)

In order to retrieve:

B. Wilson/Asher/Love

I tried something like "" (([^))]*)) but it doesn't seem to work. Also, I'd like to use Match.Submatches(0) so that might complicate things a bit because it relies on brackets...

JimmyPena
  • 8,694
  • 6
  • 43
  • 64
Daan
  • 1,417
  • 5
  • 25
  • 40
  • 1
    Possible duplicate of [Regular expression to Extract substring](http://stackoverflow.com/q/1624387/190829) – JimmyPena Jun 06 '12 at 12:59

6 Answers6

25

Edit: After examining your document, the problem is that there are non-breaking spaces before the parentheses, not regular spaces. So this regex should work: ""[ \xA0]*\(([^)]+)\)

""       'quote (twice to escape)
[ \xA0]* 'zero or more non-breaking (\xA0) or a regular spaces
\(       'left parenthesis
(        'open capturing group
[^)]+    'anything not a right parenthesis
)        'close capturing group
\)       'right parenthesis

In a function:

Public Function GetStringInParens(search_str As String)
Dim regEx As New VBScript_RegExp_55.RegExp
Dim matches
    GetStringInParens = ""
    regEx.Pattern = """[ \xA0]*\(([^)]+)\)"
    regEx.Global = True
    If regEx.test(search_str) Then
        Set matches = regEx.Execute(search_str)
        GetStringInParens = matches(0).SubMatches(0)
    End If
End Function
alan
  • 4,752
  • 21
  • 30
  • Annoyingly, it doesn't seem to work. I tried your literal method as well as incorporating it into my method... It really seems to be in issue with the regex itself: as soon as I only replace the regex by a working regex all goes well. Anyway, I thought it might be useful to give you the exact .docm file I have now, so you can have a look: http://db.tt/6XoO1Pbn The input text is already in the doc. Thanks in advance! – Daan Jun 05 '12 at 22:00
  • See my edit. Looks like there are non-breaking spaces in the document. That's what was messing us up. Hope it works for you now. – alan Jun 06 '12 at 02:44
  • This one is definitely working! I had some concern with the right boundary, resulting in a mismatch when a `)` is mentioned in the parentheses content. I wanted to propose to let the regex find the last `)` in the line. But Then I found this string: `"They Called It Rock" (Lowe, Rockpile, Dave Edmunds) - 3:10 (bonus single-sided 45, credited as Rockpile, not on original LP)`. There goes my plan :) BTW, `) ` or `) - ` wouldn't work either since the dash can differ and there's sometimes nothing at all after the `)`. I guess this can't be improved, agreed? – Daan Jun 06 '12 at 14:12
  • I don't see the problem. It's matching `Lowe, Rockpile, Dave Edmunds`, and not `(bonus ... LP)`. That's what you want, right? If you're seeing something different, I'm not sure why, but, no, I would say it can't be improved. – alan Jun 06 '12 at 14:26
  • @alan thank you you refer to ".test" in the following -----> regEx.test(search_str) I don't understand where this .test is coming from? What is it? – BenKoshy May 21 '16 at 01:32
  • 1
    @BKSpurgeon `test()` is a method on the regular expression object `regex`. You pass `test()` a string as a parameter. If the string matches the `Pattern` attribute of `regex`, `test()` returns `True`. Otherwise, `False`. See https://msdn.microsoft.com/en-us/library/y32x2hy1(v=vs.84).aspx – alan May 21 '16 at 13:04
  • This was good. It helped me figure out that the grouping is addressed in the submatches(i). That particular fact is not clearly stated in any documentation or explanation I have found. – Ken Ingram Aug 29 '19 at 04:46
4

Not strictly an answer to your question, but sometimes, for things this simple, good ol' string functions are less confusing and more concise than Regex.

Function BetweenParentheses(s As String) As String
    BetweenParentheses = Mid(s, InStr(s, "(") + 1, _
        InStr(s, ")") - InStr(s, "(") - 1)
End Function

Usage:

Debug.Print BetweenParentheses("""Wouldn't It Be Nice"" (B. Wilson/Asher/Love)")
'B. Wilson/Asher/Love

EDIT @alan points our that this will falsely match the contents of parentheses in the song title. This is easily circumvented with a little modification:

Function BetweenParentheses(s As String) As String
    Dim iEndQuote As Long
    Dim iLeftParenthesis As Long
    Dim iRightParenthesis As Long

    iEndQuote = InStrRev(s, """")
    iLeftParenthesis = InStr(iEndQuote, s, "(")
    iRightParenthesis = InStr(iEndQuote, s, ")")

    If iLeftParenthesis <> 0 And iRightParenthesis <> 0 Then
        BetweenParentheses = Mid(s, iLeftParenthesis + 1, _
            iRightParenthesis - iLeftParenthesis - 1)
    End If
End Function

Usage:

Debug.Print BetweenParentheses("""Wouldn't It Be Nice"" (B. Wilson/Asher/Love)")
'B. Wilson/Asher/Love
Debug.Print BetweenParentheses("""Don't talk (yell)""")
' returns empty string

Of course this is less concise than before!

Jean-François Corbett
  • 37,420
  • 30
  • 139
  • 188
  • I thought of suggesting this, too, but it falsely matches "Don't Talk (Put Your Head on My Shoulder)" – alan Jun 06 '12 at 09:33
  • +1 for suggesting something other than the OP's preferred method. – JimmyPena Jun 06 '12 at 12:55
  • Yeah, I appreciate the different approach. I do think I still prefer Regex. I don't know about the efficiency of it (speed is not my greatest concern) but I just like the compact notation. My main concern with this method is that it doesn't seem very specific. The left boundary is initially established as the last `"` of the string. If artist name contains any quote this will cause problems. So I still prefer to use `" (` as left boundary. – Daan Jun 06 '12 at 13:55
  • Thanks for the feedback, it is appreciated. The solution that solves the problem is the best one, regardless of the exact implementation. As you can see, there are several ways to reach your goal (extracting a substring). Focus on the goal, rather than a particular way of reaching it. Requiring that your goal be reached only by a specific path limits your options. – JimmyPena Jun 06 '12 at 16:02
  • @KeyMs92: What if the artist name contains `" (`? My point is, you have to define your problem precisely, otherwise any solution, regex or not, will have false positives / false negatives. – Jean-François Corbett Jun 07 '12 at 07:45
3

This a nice regex

".*\(([^)]*)

In VBA/VBScript:

Dim myRegExp, ResultString, myMatches, myMatch As Match
Dim myRegExp As RegExp
Set myRegExp = New RegExp
myRegExp.Pattern = """.*\(([^)]*)"
Set myMatches = myRegExp.Execute(SubjectString)
If myMatches.Count >= 1 Then
    Set myMatch = myMatches(0)
    If myMatch.SubMatches.Count >= 3 Then
        ResultString = myMatch.SubMatches(3-1)
    Else
        ResultString = ""
    End If
Else
    ResultString = ""
End If

This matches

Put Your Head on My Shoulder

in

"Don't Talk (Put Your Head on My Shoulder)"  

Update 1

I let the regex loose on your doc file and it matches as requested. Quite sure the regex is fine. I'm not fluent in VBA/VBScript but my guess is that's where it goes wrong

If you want to discuss the regex some further that's fine with me. I'm not eager to start digging into this VBscript API which looks arcane.

Given the new input the regex is tweaked to

".*".*\(([^)]*)

So that it doesn't falsely match (Put Your Head on My Shoulder) which appears inside the quotes.

enter image description here

buckley
  • 13,690
  • 3
  • 53
  • 61
  • Thanks for your response. Unfortunately there don't seem to be any matches using this pattern. Let me give you the source I'm testing this on: http://tiny.cc/ij3ffw. – Daan Jun 05 '12 at 19:39
  • @KeyMs92 The examples on that webpage are more clear. I updated my answer – buckley Jun 05 '12 at 19:48
  • Yeah, I should have given a better example. Seem my OP. – Daan Jun 05 '12 at 19:54
  • My regex matches the string "B. Wilson/Asher/Love" in group 1. Let me know if you have any more questions. – buckley Jun 05 '12 at 20:00
  • It seems the problem is with the regex itself. Using `Match` doesn't work in any case. I've uploaded my docm file in one of the comments so you can have a look. – Daan Jun 05 '12 at 22:03
  • It gives me an error for some reason :/ I think it's indeed an issue with VBA and perhaps how text is pasted into Word. As Alan found out, the space that ist pasted into Word is actually a non-breaking space. – Daan Jun 06 '12 at 13:57
  • thank you @buckley i'm confused by this line, specifically the 3-1 (three minus one). I have no idea what is going on here - what is the exact result it is attaining? -------> ResultString = myMatch.SubMatches(3-1) – BenKoshy May 21 '16 at 01:37
  • @BKSpurgeon Thats an old answer and can't remember the reasoning behind it. Reading the documentation, SubMatches(0) holds the text matched by the first capturing group. So 3-1=2 Will match the third capture group. But then again, the regex has only one capture group so that does not make sense. I can't remember why I did this 4 years ago :) – buckley May 22 '16 at 17:27
  • @buckley thanks - but generally speaking, how will the third capture group match if you are after the first one? – BenKoshy May 22 '16 at 22:18
  • Good question. I cant see this working either but maybe vbscript is forgiving and returns the last group if you ask for a non existing group that has a higher rank than the greatest group. You'll have to test it. Or it can be that its just not correct as I can't remember how thoroughly I tested it. – buckley May 22 '16 at 23:00
2

This function worked on your example string:

Function GetArtist(songMeta As String) As String
  Dim artist As String
  ' split string by ")" and take last portion
  artist = Split(songMeta, "(")(UBound(Split(songMeta, "(")))
  ' remove closing parenthesis
  artist = Replace(artist, ")", "")
End Function

Ex:

Sub Test()

  Dim songMeta As String

  songMeta = """Wouldn't It Be Nice"" (B. Wilson/Asher/Love)"

  Debug.Print GetArtist(songMeta)

End Sub

prints "B. Wilson/Asher/Love" to the Immediate Window.

It also solves the problem alan mentioned. Ex:

Sub Test()

  Dim songMeta As String

  songMeta = """Wouldn't (It Be) Nice"" (B. Wilson/Asher/Love)"

  Debug.Print GetArtist(songMeta)

End Sub

also prints "B. Wilson/Asher/Love" to the Immediate Window. Unless of course, the artist names also include parentheses.

Community
  • 1
  • 1
JimmyPena
  • 8,694
  • 6
  • 43
  • 64
1

This another Regex tested with a vbscript (?:\()(.*)(?:\)) Demo Here


Data = """Wouldn't It Be Nice"" (B. Wilson/Asher/Love)"
wscript.echo Extract(Data)
'---------------------------------------------------------------
Function Extract(Data)
Dim strPattern,oRegExp,Matches
strPattern = "(?:\()(.*)(?:\))"
Set oRegExp = New RegExp
oRegExp.IgnoreCase = True 
oRegExp.Pattern = strPattern
set Matches = oRegExp.Execute(Data) 
If Matches.Count > 0 Then Extract = Matches(0).SubMatches(0)
End Function
'---------------------------------------------------------------
Hackoo
  • 18,337
  • 3
  • 40
  • 70
0

I think you need a better data file ;) You might want to consider pre-processing the file to a temp file for modification, so that outliers that don't fit your pattern are modified to where they'll meet your pattern. It's a bit time consuming to do, but it is always difficult when a data file lacks consistency.

Lawrence Knowlton
  • 99
  • 1
  • 3
  • 10