-3

I'm trying to extract all the text between 2 words in a cell full of text, in Google Sheets and can't find something that works with Regextract ... everything returns an error

I have tried :

=REGEXEXTRACT(B2,"Word1(.*)Word2")

any other suggestions ?

The only formula that works it's a long formula FIND

=TRIM(MID(SUBSTITUTE(B2,"Word1","|",1),FIND("|",SUBSTITUTE(B2,"Word1","|",1))+1,FIND("Word2",SUBSTITUTE(B2,"Word1","|",1))-FIND("|",SUBSTITUTE(B2,"Word1","|",1))-1))

The text inside the cell has also carriage returns ...does it matter ?

Thank you.

Barmar
  • 741,623
  • 53
  • 500
  • 612
Ovi
  • 1
  • 3
  • That looks like it should work according to the [documentation](https://support.google.com/docs/answer/3098244?hl=en). Is it really `Word1` and `Word2` in the regexp? If not, show the actual regular expression, you may have some special characters that have to be escaped. – Barmar Mar 08 '23 at 20:48
  • Instead of Word1 and Word2 I'm using some localized words based on the content , nothing special in them ... just simple letters ... but it doesn't work ... I've updated the info above with the formula that works ... maybe it can help you – Ovi Mar 08 '23 at 20:57
  • Didn't you see how I fixed the code markdown in your question the first time? Why couldn't you do that yourself when you added additional code, instead of me having to fix it again? – Barmar Mar 08 '23 at 20:58
  • The error message clearly says there's smoething wrong with the regexp, so it's probably not as simple as you claim. Post the actual code. – Barmar Mar 08 '23 at 20:58
  • Why do you need all the substitutions, instead of just `FIND("Word1", B2)` and `FIND("Word2", B2)`? – Barmar Mar 08 '23 at 21:01
  • @Barmar I've discovered the problem while testing with something simple ... Word2 is on another row inside the cell like when you hit Command + ENTER so it's after a carriage return ... does regex still works in this case ? ...if so, how can I make it work ? and please also tell me how can I escape any special characters ... that will really help me on the other texts I have ... thanks. – Ovi Mar 08 '23 at 21:01
  • 1
    `.*` doesn't match across newlines. Use `[\s\S]*` to match any characters including newlines. – Barmar Mar 08 '23 at 21:02
  • @Barmar It worked like a charm ... Thank you very much ... If you can I would also want to ask the following : 1. if instead of Word1, I have something like this "Word 12:" ... in this case I should escape somehow those characters like space and : ? ... 2 . Is there a possibility to use regex and tell it to extract all text after Word1 and until a newline ? – Ovi Mar 08 '23 at 21:07
  • 1. You don't need to escape any of those characters. See https://github.com/google/re2/blob/main/doc/syntax.txt for the special characters that need to be escaped. – Barmar Mar 08 '23 at 21:11
  • `Word1(.*)` Since `.*` stops matching at newline, this extracts everything until newline. – Barmar Mar 08 '23 at 21:12

1 Answers1

0

. doesn't match newlines. To match a multi-line string, use [\s\S]. \s matches whitespace, \S matches non-whitespace, and together in the character set they match anything.

=REGEXEXTRACT(B2,"Word1([\s\S]*)Word2")
Barmar
  • 741,623
  • 53
  • 500
  • 612