2

Folks, there are already billions of questions on "regex: match everything, but not ...", but non seems to fit my simple question.

A simple string: "1 Rome, 2 London, 3 Wembley Stadium" and I want to match just "1 Rome, 2 London, 3 Wembley Stadium", in order to extract only the names but not the ranks ("Rome, London, Wembley Stadium").

Using a regex tester (https://extendsclass.com/regex-tester.html), I can simply match the opposite by:

([0-9]+\s*) and it gives me:

"1 Rome, 2 London, 3 Wembley Stadium".

But how to reverse it? I tried something like:

[^0-9 |;]+[^0-9 |;], but it also excludes white spaces that I want to maintain (e.g. after the comma and in between Wembley and Stadium, "1 Rome, 2 London, 3 Wembley Stadium"). I guess the "0-9 " needs be determined somehow as one continuous string. I tried various brackets, quotation marks, \s*, but nothing jet.

Note: I'm working in a visual basic environment and not allowing lookbehinds!

MsGISRocker
  • 588
  • 4
  • 21
  • Try `\d+\s*(.*?)(?:,|$)` and get the values from `match.Submatches(0)`. See [the regex demo](https://regex101.com/r/cJ8Wvi/1). – Wiktor Stribiżew Jul 21 '21 at 14:45
  • If your string is really that simple, you can also use `Split` function to get the array like `Split(simpleStr,",")` then use `Right` with Instr to check for the first white space in each item. – Raymond Wu Jul 21 '21 at 14:52
  • Or, `\d+\s*(.*?)(?=,\s*\d+\s|$)`, see [this regex demo](https://regex101.com/r/cJ8Wvi/2). – Wiktor Stribiżew Jul 21 '21 at 14:54
  • ````\d+\s*(.*?)(?=,\s*\d+\s|$)```` matches everything but "````, ````", but I need to match everything, but "````1 ````", "````2 ````", ... (be aware of the white space behind the numbers) – MsGISRocker Jul 21 '21 at 15:03
  • 1
    No, `\d+\s*(.*?)(?=,\s*\d+\s|$)` ***captures*** what you need in Group 1. It does not matter what it *matches*. – Wiktor Stribiżew Jul 21 '21 at 15:04
  • I'm trying to get my head around. It does not so in: https://extendsclass.com/regex-tester.html#js. Why that? – MsGISRocker Jul 21 '21 at 15:14

1 Answers1

1

You can use

\d+\s*(.*?)(?=,\s*\d+\s|$)

See the regex demo, get the values from match.Submatches(0). Details:

  • \d+ - one or more digits
  • \s* - zero or more whitespaces
  • (.*?) - Group 1: zero or more chars other than line break chars as few as possible
  • (?=,\s*\d+\s|$) - a positive lookahead that requires ,, zero or more whitespaces, one or more digits and then a whitespace OR end of string immediately to the right of the current location.

Here is a demo of how to get all matches:

Sub TestRegEx()
    Dim matches As Object, match As Object
    Dim str As String

    str = "1 Rome, 2 London, 3 Wembley Stadium"
    
    Set regex = New regExp
    regex.Pattern = "\d+\s*(.*?)(?=,\s*\d+\s|$)"
    regex.Global = True

    Set matches = regex.Execute(str)
    
    For Each match In matches
        Debug.Print match.subMatches(0)
    Next
End Sub

Output:

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • @ Wiktor, I know it is frustrating to discuss with people that have no clue of what they are talking about, but I cannot reproduce it. 1. I tried the regex in: https://extendsclass.com/regex-tester.html#js, but it includes the numbers. 2. I tried to run your macro in Excel, but get a "Compile error: User defined type not defined" and 3. I tried the regex within a VBA function (https://stackoverflow.com/questions/68349611/vba-regex-extract-multiple-strings-between-strings-within-excel-cell-with-custo) and it includes the numbers as indicated by: https://extendsclass.com/regex-tester.html#js. – MsGISRocker Jul 22 '21 at 09:10
  • @MsGISRocker 1) If you do not know how to use the regex from online regex testers, do not use them, rely on what experts advise (really, a [lot of people do not understand how to use online regex testers](https://stackoverflow.com/q/39636124/3832970)). 2) Try googling these errors, [this is the solution](https://stackoverflow.com/questions/21139938/vba-regexp-causes-compile-error-while-vbscript-regexp-works), 3) Why did you try to use the code that is not meant to be used with **my** regex? Use ***my code*** after referencing the right library. – Wiktor Stribiżew Jul 22 '21 at 09:25
  • Ok, I got your script running and yes, it actually does what I asked for. However, it does so only within your script, but not in my function that I would like it to work in (https://stackoverflow.com/questions/68349611/vba-regex-extract-multiple-strings-between-strings-within-excel-cell-with-custo). Is there any more generally applicable solution? BTW, I owe u a coffee! – MsGISRocker Jul 23 '21 at 13:07
  • @MsGISRocker At Stackoverflow, we can only answer what was asked. If you post your code, I can try to accommodate the solution to fit your code. Without that, I can only suggest what I have. – Wiktor Stribiżew Jul 23 '21 at 13:12
  • 1
    Did so! (https://stackoverflow.com/questions/68500504/regex-match-everything-but-not-a-certain-multiletter-string-within-vba-excel-f) – MsGISRocker Jul 23 '21 at 14:06