1

I need to use RegEx to find a string in Word document. The string contains numbers{1;5} with one space, two spaces or no space at all followed by / and four numbers{4} which denotes a year in which the document was created.

The strings I need to find are like these:

  • 45 /2017
  • 125 /2019
  • 1245 /2018
  • 12577 /2019
  • 37589 /2017

but sometimes there is no space in between:

  • 45/2017
  • 125/2019
  • 1245/2018
  • 12577/2019
  • 37589/2017

and sometimes there is two spaces:

  • 45 /2017
  • 125 /2019
  • 1245 /2018
  • 12577 /2019
  • 37589 /2017

but it's always in the same place - the space or two spaces. It's always after the first string of numbers, which are never more than 5 digits - therefore I type [0-9]{1;5} in RegEx.

But when I tried to build this RegEx expression in Word's Ctrl+H find and replace dialog box, and I test it against the Word's document I cannot build universal expression to find these strings with space and no space. The ? is not working for me. I've read here that ? should find one space or no space, but when I'm testing it in my Word document it finds a string with two spaces inside, but not the one with one space or no space at all.

Here are the RegExs I've tried, which failed to match both strings with no space inside like 4125/2019, as well as similar strings with one or two spaces:

  1. [0-9]{1;5} ?/[0-9]{4}

  2. [0-9]{1;5}\s?/[0-9]{4}

  3. [0-9]{1;5}[ ]{0;2}/[0-9]{4} this gives an error but [0-9]{1;5}[ ]{1;2}/[0-9]{4} finds 125 /2019 but doesn't find 125/2019 and 12577/2019.

In Polish version of MS Word/Excel we use ; instead of , in formulas, but I tried it all with colon or semi-colon interchangeably.

I've read these sources: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

and

Regex- Space or no space

Analyst_Cave

Regular Expression Tester

michal roesler
  • 479
  • 2
  • 9
  • 26
  • 1
    Never try to use RegEx to manipulate Word data inside a document - it can't work. You need to use Word's built-in Find functionality with wildcards (Word's version of RegEx). – Cindy Meister Feb 15 '20 at 17:08
  • So Word's find and replace wildcards and VBA RegEx is something different? Is the syntax different? – michal roesler Feb 15 '20 at 20:39
  • 1
    Syntax is different (not as powerful) but the reason is that the result remains *in the context of the Word document*. RegEx can only work on a string, which will have no reference to the context of the document. – Cindy Meister Feb 16 '20 at 16:16

2 Answers2

2

Try

([0-9]{1,5})([ ]{0,4})([/])([0-9]{4})

or

([0-9]{1,5})([\s]{0,4})(/[0-9]{4})

According to the test, all three are good.

enter image description here

Dy.Lee
  • 7,527
  • 1
  • 12
  • 14
  • If the space is between both numbers strings coded as ```([ ]{0,4})``` or ```([\s]{0,4})``` than why the first digits sequence you wrote ```([0-9]{1,7})``` in both cases, which suggests there can be up to seven digits, even though I said its never more than 5 digits? The first number is always below 99999. – michal roesler Feb 15 '20 at 21:53
  • 1
    @michal roseler, I missed. Crrect that to {1,5}. – Dy.Lee Feb 15 '20 at 23:41
  • Thank you for that answer @Dy.Lee. It solves my problem and it finds the desired strings flawlessly. I just picked this other answer by macropod because he showed me, that I don't need Regex at all to achieve the goal of the macro I'm writing now. His solution will make my whole code simpler. But your answer works fine and I'll use it in the future for sure. Thx. – michal roesler Feb 16 '20 at 18:43
1

You can use a Word wildcard Find expression like:

<[0-9 ]{1;7}/[0-9]{4}>

without the need to involve the RegEx library.

macropod
  • 12,757
  • 2
  • 9
  • 21