0

I have these long strings that have multiple substrings in them all separated by periods. The good news is I've found out how to extract most of the substrings on the left or right of the strings by using functions like left, mid, right, regexextract, find, len, and substitute, but I just can't figure out this last problem.

The problem with these substrings is sometimes some are there, sometimes none are there (most I've seen at once is, I think, 3). And other than being in all caps, which some of the other substrings that I don't want are also in, I don't think there's any regex pattern one could use except something like string1|string2|string3, etc all the way up to maybe string30.

I first thought it would be best to just have a formula look at the string, compare it to a range on another sheet, and if there was something in the range that was in the string, then show it. But I was lost on how to do that. Then I figured just put the whole range list in a regex and somehow extract any substrings that were in the string.

And that worked, but it would only extract the first substring it found whereas I wanted it to extract all the substrings it found. And while I think I'd prefer the substrings to be put into different columns (not rows) by using the Split function, I'd settle for them all being put in the same cell via the Textjoin function.

The farthest I've gotten is

=split(REGEXextract(A2,"\b(?:string1|string2|string3)\b")," ")

but like I said that only spits out the first substring it finds. And I've seen some people use REGEXreplace with Split and ArrayFormula and sometimes double REGEX functions, but I just can't seem to make those work for my purposes.

I'm doing this in GoogleSheets, but even an Excel or LibreOffice answer will probably be helpful as I can probably turn them into a GS solution. I realize I could just make a simple regexextract in 30 or so columns, but I'd really rather not do that. Thanks in advance, even if you just give me an idea of what direction to head in.

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
kq76
  • 3
  • 2
  • 1
    Post few sample data then desired output. – Harun24hr Dec 12 '22 at 02:48
  • Okay, this string is made up, but it should get the idea across: Farewell.to.the.Master.1940.Harry_Bates.SCI-FI.SHORT_STORY.MOVIE_INSPIRATION.CLASSIC.LargePrint.Hardcover.Sterling. The only substrings I want from that are Sci-Fi, Short_Story, and Movie_Inspiration, but, again, there are other substrings that could be there instead, or there might be none, and while Classic is also uppercase it isn't wanted. – kq76 Dec 12 '22 at 03:54

1 Answers1

0

You could try something like this, that would filter all values that match your desired list of substrings. Replace F1:F2 with the range where you save the values you want to appear, and A3 with the cell of the substring. If you need you can set this as an array with Map or BYROW, for example

=filter(split(A3,"."),INDEX(REGEXMATCH(SPLIT(A3,"."),JOIN("|",F1:F2))))
Martín
  • 7,849
  • 2
  • 3
  • 13
  • That looks like it might be going the right direction! However, when I plug it in (and switch the F1:F2 to my range) it separates every substring that is delimited by periods, not just the ones in that range. Filter, RegexMatch, and Join are all new to me (as are Map and ByRow) so I'll look into them and see if I can make it work. I do like that it's looking at that range! I didn't want to have to include the list in the formula because if I added something to the list I'd have to update the formulas. So yeah, I like this approach, but it's not quite there yet. – kq76 Dec 12 '22 at 06:10
  • Ah hah! It does work. There were just 2 problems. 1) The range had empty cells (I wanted to leave room to add more later), but I fixed that by changing the JOIN part to `"^("&JOIN("|", FILTER(Refs!E2:E31, LEN(Refs!E2:E31)))&")$"` so it would strip out any blanks (as everything would match them). And 2) my range wasn't in all caps like the substrings are, but that's an easy fix by either putting the range in all caps or changing my new second filter part to `FILTER(UPPER(Refs!E2:E31)`. – kq76 Dec 12 '22 at 08:43
  • Thank you very much! This is wonderful. I've had an extraordinarily positive first impression of this site and community. – kq76 Dec 12 '22 at 08:44
  • I'm glad it was useful and that you could work out those next steps. You can do it case insensitive with (?i) using `^(?i)("&JOIN ..` Welcome! – Martín Dec 12 '22 at 10:28