0

In a .csv spreadsheet, I have multiple strings with incrementing numerical values contained in each, and I need to extract the numbers from each string. For example, here are two strings:

DEVICE1.CM1 - 4.1.1.C1.CA_VALUE (A)
DEVICE1.CM2 - 6.7.1.C2.CA_VALUE (A)
DEVICE1.CM1 - 4.1.2.C1.CA_VALUE (A) 
DEVICE1.CM1 - 4.1.2.C2.CA_VALUE (A) 
DEVICE1.CM1 - 4.1.2.C3.CA_VALUE (A) 
DEVICE1.CM1 - 5.1.1.C1.CA_VALUE (A) 
DEVICE1.CM1 - 5.1.1.C2.CA_VALUE (A) 
DEVICE1.CM1 - 5.10.1.C3.CA_VALUE (A) 
DEVICE1.CM1 - 6.13.1.C10.CA_VALUE (A)

And I am looking to extract "4.1.1.C1" from the first string, and "6.7.1.C2" from the second string.

I have over 1000 strings, each with a different incremental value in the form of "#.#.#.C.#" and all of the options I have tried so far involve searching for a specific value to extract, rather than extracting all values of that general form. Is there any reasonable way to accomplish this?

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
  • https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops – BigBen Nov 19 '21 at 16:33
  • 1
    Two pieces of data isn't quite enough to come up with a reliable method. Are the values always preceded by " - " and followed by "_VALUE"? – cybernetic.nomad Nov 19 '21 at 16:33
  • Use mid() with find() to get the " - " and " _ ". Then copy paste values. – Solar Mike Nov 19 '21 at 16:38
  • Yes, they are always preceeded by " - " and followed by ".CA_VALUE (A)". Here are some more examples: DEVICE1.CM1 - 4.1.2.C1.CA_VALUE (A) DEVICE1.CM1 - 4.1.2.C2.CA_VALUE (A) DEVICE1.CM1 - 4.1.2.C3.CA_VALUE (A) DEVICE1.CM1 - 5.1.1.C1.CA_VALUE (A) DEVICE1.CM1 - 5.1.1.C2.CA_VALUE (A) DEVICE1.CM1 - 5.10.1.C3.CA_VALUE (A) DEVICE1.CM1 - 6.13.1.C10.CA_VALUE (A) Let me know if I can provide any more details that would help. – Patrick Coffey Nov 19 '21 at 16:41
  • 1
    You could also use PowerQuery for this. Load your csv, transform data and load into Excel. – JvdV Nov 19 '21 at 16:47
  • As @SolarMike said, use a combination of `FIND` and `MID` to get the middle of the string. If your text is in cell A1 try `=MID(A1,FIND(" - ",A1,1)+3,FIND("_",A1,FIND(" - ",A1,1))-FIND(" - ",A1,1)-6)`. – PeterT Nov 19 '21 at 16:51
  • @PatrickCoffey Allow me a hint as new participant: You got several answers - it's good use and also helpful for other readers to mark one of them (your preferred one) as accepted if you found it helpful (acceptance is indicated by a colored checkmark next to the answer). C.f. ["Someone answers"](https://stackoverflow.com/help/someone-answers) – T.M. Nov 21 '21 at 19:00

3 Answers3

3

I am not a big fan of regular expressions because they are often hard to read, but this is a typical example where you should use them. Read carefully the Q&A BigBen linked to in the comments.

Function extractCode(s As String) As String
    Static rx As RegExp
    If rx Is Nothing Then Set rx = New RegExp
    rx.Pattern = "\d+\.\d+\.\d+\.C\d"
    If rx.Test(s) Then
        extractCode = rx.Execute(s)(0)
    End If
End Function

(You will need to add the reference to the Microsoft VBScript Regular Expression library)

--> Updated my answer, you need to escape the dot, else it is a placeholder for any character and the pattern would also match something like "4x1y2zC3",

FunThomas
  • 23,043
  • 3
  • 18
  • 34
  • 1
    I'm quite a newb when it comes to regex. What do you think about `\d+\.\d+\.\d+\.C\d+`? Isn't dot without a preceding backslash just any character? I've been playing with it [here](https://regex101.com/r/Tve6J2/1). – VBasic2008 Nov 19 '21 at 17:20
  • @VBasic2008 You are right. I will update my answer. – FunThomas Nov 19 '21 at 17:34
1

So here goes: enter image description here

MID(A1,FIND("-",A1,1)+2,(FIND("_",A1,1)-FIND("-",A1,1))-5)
Solar Mike
  • 7,156
  • 4
  • 17
  • 32
1

The fixed structure

(items) are always preceeded by " - " and followed by ".CA_VALUE (A)"

allows to isolate the code string via Split as follows:

  • consider ".CA_VALUE (A)" as closing delimiter, but change occurrence(s) to "- "
  • execute Split now on the resulting string using only the first delimiter (StartDelim "- ")
  • isolate the second token (index: 1 as split results are zero-based)
Function ExtractCode(ByVal s As String) As String
    Const StartDelim   As String = "- "
    Const ClosingDelim As String = ".CA_VALUE (A)"
    ExtractCode = Split(Replace(s, ClosingDelim, StartDelim), StartDelim)(1)
End Function


Another approach with focus on splitting via point delimiters //Edit 2021-11-20

If you want to experiment with a fixed start position of your 4-items code in a split array (based on point delimiters "."), you might also consider the following approach:

  • split via point delimiters "."
  • filter only the 3rd,4th,5th and 6th item via WorksheetFunction.Index (by its columns argument)
  • join the resulting items again via connecting points "."

a) Using (Excel) version MS 365

Function ExtractCode(ByVal s As String, Optional startPos As Long = 3) As Variant
    Const delim As String = "."
    Dim tmp
    tmp = Split(Replace(s, "- ", delim), delim)   ' normalize hyphen to point delimiter
    With Application.WorksheetFunction
        ExtractCode = Join(.Index(tmp, 0, .Sequence(1, 4, startPos)), ".")
    End With
End Function

b) Make it backwards compatible

Just change the function result assignment to

        ExtractCode = Join(.Index(tmp, 0, Evaluate("{1,2,3,4}-1+" & startPos)), ".")

which in both cases changes the Index column argument to a 1-based column number Array(3,4,5,6)

T.M.
  • 9,436
  • 3
  • 33
  • 57