0

I have a text like this

EXPRESS      blood| muscle| testis| normal| tumor| fetus| adult
RESTR_EXPR   soft tissue/muscle tissue tumor

Right now I want to only extract the last item in EXPRESS line, which is adult.

My pattern is:

[|](.*?)\n

The code goes greedy to muscle| testis| normal| tumor| fetus| adult. Can I know if there is any way to solve this issue?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 2
    How about just splitting at `|`? – Klaus D. Nov 28 '22 at 06:14
  • 2
    `[^|]*\n` should give you a greedy match for everything until the end of the line that does not contain the character `|`, which should produce the result you want. So instead of trying to split on `|`, just take everything after the last occurrence (the last text between the last `|` and the newline) – Green Cloak Guy Nov 28 '22 at 06:15
  • Your pattern really means "as little as possible from (the _first_ occurrence of) `|` through `\n`". – tripleee Nov 28 '22 at 06:41
  • Another idea is to let *greedy* `.*` consume everything before the *last* `|` and [capture](https://www.regular-expressions.info/brackets.html) anything after it: [`.*\| *(.+)`](https://regex101.com/r/iAZhFC/2) [Python demo](https://tio.run/##LY67DsIwDEX3foWVKSkoCDEzdkctA0OlqtCUVspLjjNUyr@HQPF2LZ9z7TdanL3kvBrvkABVVZltCIRwBc6ax61tug5@89TOTQlMDC@tEpAKtIYE1qEZdcnROEwwK4plO05RU28LfW@Hr6YIgpsJChOiOu2Wf9pZJkp3qUUlgxrxtXBksu4T1FweBDvC/li58rha4ka@0UXPz0Lk/AE) – bobble bubble Nov 28 '22 at 12:33

2 Answers2

1

You can take the capture group value exclude matching pipe chars after matching a pipe char followed by optional spaces.

If there has to be a newline at the end of the string:

\|[^\S\n]*([^|\n]*)\n

Explanation

  • \| Match |
  • [^\S\n]* Match optional whitespace chars without newlines
  • ( Capture group 1
    • [^|\n]* Match optional chars except for | or a newline
  • ) Close group 1
  • \n Match a newline

Regex demo

Or asserting the end of the string:

\|[^\S\n]*([^|\n]*)$
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

You could use this one. It spares you the space before, handle the \r\n case and is non-greedy:

\|\s*([^\|])*?\r?\n

Tested here