1

I'm looking to parse out a specific piece of text from long unstructured text. The section I want to capture always has an "x" with integers on both the left and right of it.

Here is my formula:

=IFERROR(SUBSTITUTE(RIGHT(LEFT(G2,FIND("x",G2)-1),FIND("_",G2)-3)&MID(G2,FIND("x",G2),FIND("_",G2)-2),"_",""),"1x1")

and another version I tried to handle spaces with an OR statement (that doesn't work)

=IFERROR(SUBSTITUTE(RIGHT(LEFT(G4,FIND("x",G4)-1),FIND(OR("_"," "),G4)-3)&MID(G4,FIND("x",G4),FIND("_",G4)-2),"_",""),"1x1")

Raw Text - My Formula's Result - Desired Result

Q1-Q4_Year_Source_Type_P_LongName_300x250_Target_Server 300x250 300x250
Q1-Q4_Year_Client_Client Year_Type_P_LongName_1600x1000_Site_Server 600x100 1600x1000
02.04 Search Sponsorship - 728x90   1x1 728x90
Some Website_300x600 ROS Display    ebsite300x600 ROS Di    300x600

Ideally, if I could just get the MID formula MID(G2,FIND("x",G2),FIND("_",G2)-2) to read from right to left instead of left to right, I think I would be in good shape.

Thanks.

nihaodave
  • 11
  • 4

3 Answers3

0

There is a standard formula for extracting a number from a string (found in many variations):-

=LOOKUP(99^99,--("0"&MID(A1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")),ROW($1:$15))))

so you could use this starting a few characters to the left of the "x" and then from the "x" itself. Suggest using helper cells for these to avoid a long formula so if your original string is in A1:-

=mid(A1,find("x",A1)-5,999)   in B1

=mid(A1,find("x",A1),999)     in C1

then for the first number in D1

=LOOKUP(99^99,--("0"&MID(B1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},B1&"0123456789")),ROW($1:$15))))

and for the second number in E1

=LOOKUP(99^99,--("0"&MID(C1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},C1&"0123456789")),ROW($1:$15))))

and concatenate them together:-

=D1&"x"&E1

enter image description here

Tom Sharpe
  • 30,727
  • 5
  • 24
  • 37
0

Here is a complicated formula that

  • Splits the string into an array of individual words, separated by space or underscore
  • looks for a word that matches the pattern nnnxnnn defined as:
    • Starts with a number
    • followed by a lower case x (if the x could be either case, replace FIND with SEARCH in the formulas below)
    • ends with a number
  • The function will return the last word in the string that matches that pattern.

The formula includes several "sub formulas"

We split the string by both space and underscore into an array of words:

=TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99))

In the above SEQ is a Named Formula: ( Formulas ► Define Name)

    =IF((ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$65536,255,1))-1)*99=0,1,(ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$65536,255,1))-1)*99)

That formula generates a series of numbers 1,99,198,297, ... which provides a good starting point for the MID function in the first formula.

We then use LEFT and MID functions to find the words that contain an x, and have numbers before and after the x

ISNUMBER(-LEFT(TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)),FIND("x",TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)))-1))

ISNUMBER(-MID(TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)),FIND("x",TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)))+1,99)))

Multiplying those two formulas will return an array of 0's and 1's for words that match or do not match the pattern.

1/(...)

will then return an array of 1 or DIV/0 errors.

Using the vector form of LOOKUP will then return the value in our string array that is in the same position as the match in our pattern matching array.

=LOOKUP(2,1/(ISNUMBER(-LEFT(TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)),FIND("x",TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)))-1))*ISNUMBER(-MID(TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)),FIND("x",TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)))+1,99))),TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)))

enter image description here

I would note that using VBA and Regular Expressions, that same pattern can be expressed a \d+x\d+

and a User Defined Function can be used accomplishing the same thing, taking a fraction of the time to devise once you are fluent;

Option Explicit
Function ExtractMeasure(S As String) As String
    Dim RE As Object, MC As Object

Set RE = CreateObject("vbscript.regexp")
With RE
    .Pattern = "\d+x\d+"
    .Global = False
    .ignorecase = False 'Case Sensitive
    If .test(S) = True Then
        Set MC = .Execute(S)
        ExtractMeasure = MC(0)
    End If
End With
End Function
Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
-1

In this case, it is best to use regular expressions in EXCEL. Please refer to below post in using Regular expressions in EXCEL. But, you have to use VBA for this. How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

Community
  • 1
  • 1
Venkataraman R
  • 12,181
  • 2
  • 31
  • 58