-1

I am trying to extract part of a strings below

I tried (.*)(?:table)?,it fails in the last case. How to make the expression capture entire string in the absence of the text "table"

  1. Text: "diningtable" Expected Match: dining
  2. Text: "cookingtable" Match: cooking
  3. Text: "cooking" Match:cooking
  4. Text: "table" Match:""
learningtocode
  • 755
  • 2
  • 13
  • 27

4 Answers4

2

Rather than try to match everything but table, you should do a replacement operation that removes the text table.

Depending on the language, this might not even need regex. For example, in Java you could use:

String output = input.replace("table", "");
4castle
  • 32,613
  • 11
  • 69
  • 106
  • Good point! I will take the suggestion :) But I was trying to learn regex and this looked like a simple enough problem, but apparently not. – learningtocode Aug 12 '16 at 06:20
1

The (.*)(?:table)? fails with table (matches it) as the first group (.*) is a greedy dot matching pattern that grabs the whole string into Group 1. The regex engine backtracks and looks for table in the optional non-capturing group, and matches an empty string at the end of the string.

enter image description here

The regex trick is to match any text that does not start with table before the optional group:

^((?:(?!table).)+)(?:table)?$

See the regex demo

Now, Group 1 - ((?:(?!table).)+) - contains a tempered greedy token (?:(?!table).)+ that matches 1 or more chars other than a newline that do not start a table sequence. Thus, the first group will never match table.

The anchors make the regex match the whole line.

NOTE: Non-regex solutions might turn out more efficient though, as a tempered greedy token is rather resource consuming.

NOTE2: Unrolling the tempered greedy token usually enhances performance n times:

^([^t]*(?:t(?!able)[^t]*)*)(?:table)?$

See another demo

But usually it looks "cryptic", "unreadable", and "unmaintainable".

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • So, only use this solution **if** you cannot use a "normal" programming language. An unrolled version is much faster, but is really hard to maintain since few people can understand a good, efficient regex pattern. – Wiktor Stribiżew Aug 12 '16 at 06:24
  • Is there a way to extract everything except the last "table", like "tablettable", to return tablet? – learningtocode Aug 12 '16 at 06:28
  • 2
    Well, that is of course possible with *matching* - [`^((?:(?!table$).)+)(?:table)?$`](https://regex101.com/r/oY0nI0/1). But you'd better just check if the string ends with `table`, and remove it. Or with regex.replace - `table$`. – Wiktor Stribiżew Aug 12 '16 at 06:30
  • Yep, I will do that. It looks a bit odd though that I have to exclude "table" from the first group to achieve this. All I want is everything except the last optional "table". Thank you though..makes me feel better that it indeed was a difficult problem :) – learningtocode Aug 12 '16 at 06:34
1

If you want to use regex, you can use this one:

(^.*)(?=table)|(?!.*table.*)(^.+)

See demo here: regex101

The idea is: match everything from the beginning of the line ^ until the word table or if you don't find table in the string, match at least one symbol. (to avoid matching empty lines). Thus, when it finds the word table, it will return an empty string (because it matches from the beginning of the line till the word table).

Maria Ivanova
  • 1,146
  • 10
  • 19
1

Despite other great answers, you could also use alternation:

^(?|(.*)table$|(.*))$

This makes use of a branch reset, so your desired content is always stored in group 1. If your language/tool of choice doesn't support it, you would have to check which of groups 1 and 2 contains the string.

See Demo

Sebastian Proske
  • 8,255
  • 2
  • 28
  • 37