1

I hope that you can help me.

I have a data set which includes a free text comments field; within that field there are work order numbers entered which I want to count - the WO numbers are ALWAYS 8 digits long and I thought that I had cracked it with this: =ROUNDDOWN(SUMPRODUCT(LEN(M7)-LEN(SUBSTITUTE(M7, {0,1,2,3,4,5,6,7,8,9},"")))/8,0) which correctly identifies the WO numbers in a comment like this:

NWMS - 67431710 - Mattress eta pending from contractor
NWMS - 67431797 - Double base eta pending from contractor

and returns a count of 2 (that's a single cell with multi-line comments).

Also; because of the ROUNDDOWN formula; a comment like this: CMS - 67630485 smell From waste outlets or blockage eta 05.02 only returns a count of 1 WO (which is correct)

However; the new problem I am facing is when an ETA date is added which also has 8 digits - e.g. 07/02/2020 - this incorrectly adds 1 to the WO number count.

I guess that I'm after a way of counting blocks of 8 contiguous numbers in a larger text string. This will exclude any other digits however they are entered.

I'm working in Google Sheets but I understand Excel better so a formula for either will help me as I should be able to convert it either way.

player0
  • 124,011
  • 12
  • 67
  • 124

4 Answers4

3

You could try using regular expressions (in Google Sheets) to match an eight-digit number.

For example, the following will replace all groups of eight-digit numbers with @.

=REGEXREPLACE(M7,"\d{8}","@")

And then you could use the len(M7) - len(substitute(... method to count the number of occurrences of eight-digit numbers in the string.

The final formula would look something like

=len(REGEXREPLACE(M7,"\d{8}","@"))-len(SUBSTITUTE(REGEXREPLACE(M7,"\d{8}","@"),"@",""))
ADW
  • 4,177
  • 1
  • 14
  • 22
  • 1
    This looks great; I guess the main difference between REGEXREPLACE and SUBSTITUTE is that you can search for blocks ({8}) rather than just a single digit which is exactly what I needed. Out of interest, is there an Excel equivalent? – Alan Englefield Feb 07 '20 at 12:21
  • I don't think so. I don't really use Excel anymore. So if a new formula was recently introduced, I wouldn't know. :-) – ADW Feb 07 '20 at 12:27
  • @AlanEnglefield You'd have to write a UDF in VBA to be able to use Regex expressions on the worksheet. – Ron Rosenfeld Feb 07 '20 at 19:01
  • @RonRosenfeld Is that something you can advise on? Also; would it go with the file if I emailed it to someone else? Thanks – Alan Englefield Feb 10 '20 at 12:31
  • @AlanEnglefield Look at the accepted answer for [How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops](https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops). So far as distributing it, you'd probably want to change the recommendation from early binding to late binding, so your users won't have to tick the reference. – Ron Rosenfeld Feb 10 '20 at 13:59
3

In MS Excel you can use FILTERXML function to get necessary result:

=SUMPRODUCT(ISNUMBER(FILTERXML("<data><a>" & SUBSTITUTE(A1," ","</a><a>") & "</a></data>","//a"))*(LEN(FILTERXML("<data><a>" & SUBSTITUTE(A1," ","</a><a>") & "</a></data>","//a"))=8))

Only spaces should be without duplication.

enter image description here

Edit #1:

Solution for leading zeros and duplicate spaces:

=SUMPRODUCT(ISNUMBER(FILTERXML("<data><a>" & SUBSTITUTE(A1," ","</a><a>") & "</a></data>","//a"))*(LEN(FILTERXML("<data><a>#" & SUBSTITUTE(A1," ","</a><a>#") & "</a></data>","//a"))=9))

Edit #2:

Solution to exclude time too:

=SUMPRODUCT(ISNUMBER(FILTERXML("<data><a>" & SUBSTITUTE(SUBSTITUTE(A5,":","_")," ","</a><a>") & "</a></data>","//a"))*(LEN(FILTERXML("<data><a>#" & SUBSTITUTE(A5," ","</a><a>#") & "</a></data>","//a"))=9))
basic
  • 11,673
  • 2
  • 9
  • 26
  • Because of the way `FILTERXML` handles numeric data, this method will return unreliable results if the **WO** numbers have leading zero's. – Ron Rosenfeld Feb 07 '20 at 14:23
  • @Ron Rosenfield Yes, but it can be easily solved. Edited. – basic Feb 07 '20 at 14:33
  • @basic This is brilliant - thanks however; I've used your edited example on the following and I get a count of 2?? 'mde - 67832405 New Temporary Barrel Required - lock seized,cant access room - 09-Feb-2020 08:55:43' is it because of the time element? – Alan Englefield Feb 10 '20 at 11:22
  • Yes, it's because of the time element. Well, we can add another replacement - replace ":" with "_" in the ISNUMBER test. Edited. – basic Feb 10 '20 at 13:33
  • Thanks @basicbut unfortunately; your formulas miss any work order numbers which have dashes attached. As far as I understand your formulae, they are looking for a block of 8 digits **surrounded by spaces** but, as this is fed by a free text field, that is not always the case :o( – Alan Englefield Feb 10 '20 at 15:23
0

shorter Google Sheets alternative:

=INDEX(COUNTIF(SPLIT(REGEXREPLACE(A1, "\d{8}", "♦♥"), "♦"), "*♥*"))

0

player0
  • 124,011
  • 12
  • 67
  • 124
0

Since you asked about using Regular Expressions in Excel, here is simple UDF which will return a Count of Pattern in String.

It should be portable to other versions of Windows Excel. It will not work on Mac Excel as that does not have the required dll

Option Explicit
Function REGEXCount(S As String, sPattern As String) As Long
    Dim RE As Object, MC As Object

Set RE = CreateObject("vbscript.regexp")
With RE
    .Global = True
    .MultiLine = True
    .Pattern = sPattern
    Set MC = .Execute(S)
End With

REGEXCount = MC.Count
End Function

You can use this on your worksheet like:

=REGEXCount(cell_ref or string,"\b\d{8}\b")

The assumption is that your eight digits are surrounded by a non-word character or the beginning or end of the line/string.

Word characters are anything in the set of [A-Za-z0-9_]

If the WO might be preceded or followed by an underscore, then just change the pattern to: (?:[^A-Za-z0-9]|^)\d{8}(?:[^A-Za-z0-9]|$)

The UDF could be made more flexible, but this will work for your example.

enter image description here

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
  • thanks @Ron that works a treat - the 1st pattern you quoted also counts WO numbers preceded or followed by a dash too. I'm not sure if I can distribute a Macro-enabled excel file within our organisation but your answer does work!! thanks again – Alan Englefield Feb 10 '20 at 16:16
  • @AlanEnglefield Yes, both will do that since a dash or hyphen is a non-word character. Is that what you want? or not? If not, please provide examples of data and your expected outcome. – Ron Rosenfeld Feb 10 '20 at 16:31
  • that's exactly what I wanted. Basically I need to extract the **number** of work orders hidden in amongst all of the comments and ETAs however they appear (and given that it is a free text field; they are usually surrounded by a load of guff that I don't need haha) thanks again – Alan Englefield Feb 11 '20 at 09:06