2

Very simple formulas following but I am missing some understanding and it seems frustratingly simple.

Very simple text extraction:

MID(A1,Start Num, Num of Chars)

A simple formula text finding formula,

SEARCH(Find_text, within_text, start_num)

Combined these two formulas can find and extract text from a field between two text characters, for instance 'underscores', 'parentheses' 'commas'

So for example to extract

text to extract >>> Jimbo Jones

from a cell containing parentheses an easy formula would be;

Sample text A1 = Incident Report No.1234, user (Jimbo Jones) Status- pending

formula;

=MID(A1, SEARCH("(", A1)+1, SEARCH(")", A1) - SEARCH("(", A1) -1)

Extracted text = Jimbo Jones

The logic is simple

1.Identify cell containing text

2.Find the start number by nominating a first searchable character

3.Find the end number of the text being extracted by searching for the second searchable character

4.Subtracting the Start Number from the End number gives the number of characters to extract

Without using Search formula the code is;

MID=(A1,32,11) = Jimbo Jones

But if i want to extract text between commas or other identical characters (such as quotation marks, apostrophes, asterisk ) I need to use this formula following (which I found suggested)

=MID(A1, SEARCH(",", A1)+1, SEARCH(",", A1, SEARCH(",", A1) +1) - SEARCH(",",A1) -1)

Sample text A1 Incident Report No.1234 user, Jimbo Jones, Status- pending

Extracted text = Jimbo Jones

But I how do i nominate other boundaries, such as text between 3rd and 4th comma for example?

Sample text A1 Incident Report, No.1234, user, Jimbo Jones, Status- pending

The reason for my confusion is in the above formula excel finds the second iteration of the comma no matter where they are in the text yet the actual formula being used is identical to the formula finding the first comma, the count of characters seems to automatically assume somehow that I want the second comma not the first, how do i instruct the formula find subsequent iterations of commas, such as 3rd 4th or 9th?

And what am i not understanding in why the formula finds the 2nd comma?

Cheers!

  • 1
    I assume from your attempted formulas that you don't have access to O365 functions such as `TEXTAFTER`? – Jos Woolley Jan 12 '23 at 08:02
  • See how I use find(), mid() and as a bonus iferror() here: https://stackoverflow.com/a/75070705/4961700 – Solar Mike Jan 12 '23 at 08:03
  • 1
    To answer why it finds the second `,`: it looks up it's position starting from where it found the first +1 – P.b Jan 12 '23 at 08:11
  • For newer functions as per @JosWoolley, I'd suggest to use `TEXTAFTER()` and `TEXTBEFORE()`, and for older versions I'd take a look at [`FILTERXML()`](https://stackoverflow.com/q/61837696/9758194). If your question is just to extract the nth 'word/substring' between delimiters, than it has been asked before and is covered also in the linked post (2nd example 'Elements by position'). – JvdV Jan 12 '23 at 08:41
  • Thanks P.b, ok I get it, [start_num] finds the first comma then plus +1 starts searching from there. – CaptainMacro Jan 12 '23 at 09:45

2 Answers2

1

To explain what you are confused about:

At first sight it looks that it uses same formula to find 1st and 2nd searched symbol. But at second look you might notice that there is and argument start_num which tells for a formula where to start looking from. If you give first symbol location +1 (SEARCH(",", A1) +1))as that argument, formula will start looking for first search symbol in this part: ' No.1234, user, Jimbo Jones, Status- pending' and will give answer 42. You got 1st occasion place with first formula and then second occasion with formula above. Just find length by substracting and thats it.

Possible solutions:

If you have Office 365, use TEXTAFTER() and TEXTBEFORE() as others have stated where you can pass instance number as an argument:

=TEXTAFTER(TEXTBEFORE(A1,",",4),",",3)

Result: enter image description here

Then you can use TRIM() to get rid of unwanted spaces in begining and end.

If you use older version of Office you can use SUBSTITUTE() as workaround as it lets you to change nth occasion of specific symbol in text.

Choose a symbol that does not appear in your text and change 3th and 4th occasions of your searched symbol to it. Then look for them (in this example we will substitute , to #:

=MID(A1,SEARCH("#",SUBSTITUTE(A1,",","#",3))+1,SEARCH("#",SUBSTITUTE(A1,",","#",4))-(SEARCH("#",SUBSTITUTE(A1,",","#",3))+1))

Little explanation: enter image description here

Formulas used in explanation column C:

C
=SUBSTITUTE(A1,",","#",3)
=SUBSTITUTE(A1,",","#",4)
=SEARCH("#",B1)
=SEARCH("#",B2)
=MID(A1,B3+1,B4-(B3+1))
Full formula:
=MID(A1,SEARCH("#",SUBSTITUTE(A1,",","#",3))+1,SEARCH("#",SUBSTITUTE(A1,",","#",4))-(SEARCH("#",SUBSTITUTE(A1,",","#",3))+1))
Trimmed:
=TRIM(MID(A1,SEARCH("#",SUBSTITUTE(A1,",","#",3))+1,SEARCH("#",SUBSTITUTE(A1,",","#",4))-(SEARCH("#",SUBSTITUTE(A1,",","#",3))+1)))
user11222393
  • 3,245
  • 3
  • 13
  • 23
  • Thankyou, cheers!! This is awesome. I saw your response after grinding through the formula using MID and SEARCH just to surmount the challenge I gave myself - not an efficient way to do it, but I didn't want this simplicity to get the better of me! My coded answer following in the main thread – CaptainMacro Jan 12 '23 at 12:25
0

Thanks for the responses all, I grinded through using the two Formulas I asked about (MID and SEARCH) and I have a result.

It's not pretty nor elegant but it extracts the data as per requirement. I will benefit from the tips left here in response to my question as simpler options are available.

Requirement: Extract text between 3rd and 4th Commas using MID and SEARCH

Sample text A15

Incident Report (ammended), No.12545424234, user, Jimbo Jones, Status- pending

MID(A15,(SEARCH(",",A15,(1+(SEARCH(",",A15,SEARCH(",",A15)+1)))))+2,(SEARCH(",",A15,(SEARCH(",",A15,SEARCH(",",A15)+1)+(SEARCH(",",A15)))-(SEARCH(",",A15,SEARCH(",",A15)+1)-(SEARCH(",",A15)))+1)-(SEARCH(",",A15,(1+(SEARCH(",",A15,SEARCH(",",A15)+1))))))-2)

Test Extracted

Jimbo Jones

Obviously this solution works on other text, but it's also obviously not easy to quickly amend for other text locations.

Anyway, cheers again for the pointers...