1

I want to parse a table line using regex.

Input

   |---|---|---|
|---|---|---|

So far I've come up with this regex:

/^(?<indent>\s*)\|(?<cell>-+|)/g

Regex101 Link: https://regex101.com/r/wzMYxd/1

But this regex is incomplete.

This only finds the first cell --|, but I want to find all the following cells as different ----|.

Question: Can we catch the following cells with the same pattern using the regex? ExpectedOutput: groups with array of matched cells: ["---|", "----|", "---|"]

Note: no constant number of - is required

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
Kiran Parajuli
  • 820
  • 6
  • 14
  • 1
    How about [`^(?\h*)|\G\|(?-+)`](https://regex101.com/r/4woKJ7/1) what tool/lang are you using? – bobble bubble Jul 07 '22 at 16:32
  • Woah, amazing. It's working as expected. Let me try some more cases. I'm using nodejs for parsing. – Kiran Parajuli Jul 07 '22 at 16:35
  • 1
    I doubt that works in JS, maybe enough to use [`\|(?-+)|^(?[\t ]*)`](https://regex101.com/r/Cd9EtU/1) – bobble bubble Jul 07 '22 at 16:39
  • true, 1st one does not work with js :( – Kiran Parajuli Jul 07 '22 at 16:40
  • Is this processing line-wise or a multilinestring? Is it important that the matches are chained to each other? (the second pattern does not chain the matches (no `\G`). In JS there is the *sticky* `y` flag for chaining matches from start but makes only sense if single line input. – bobble bubble Jul 07 '22 at 16:44
  • 2nd regex is pretty much complete. can we also avoid lines like `|--| -- |---|` – Kiran Parajuli Jul 07 '22 at 16:44
  • but if I compare it with the preceding/following line cell count, I can catch if this is a table line or not. (I'm comparing line wise| single line at once) – Kiran Parajuli Jul 07 '22 at 16:45
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/246248/discussion-between-kiran-parajuli-and-bobble-bubble). – Kiran Parajuli Jul 07 '22 at 16:48
  • I added a second solution with just one regex in my answer (maybe works for you). It needs to be used with the `y` (sticky) flag. Happy coding then! :) – bobble bubble Jul 07 '22 at 17:38

1 Answers1

2

How about first verifying, if the line matches the pattern:

^[ \t]*\|(?:-+\|)+$

See this demo at regex101 - If it matches, extract the stuff:

^(?<indent>[\t ]*)\||(?<cell>-+)\|

Another demo at regex101 (explanation on the right side)


With just one regex maybe by use of sticky flag y and a lookahead for validation:

/^(?<indent>[ \t]*)\|(?=(?:-+\|)+$)|(?!^)(?<cell>-+)\|/gy

One more demo at regex101

The lookahead checks once after the first | if the rest of the string matches the pattern. If this first match fails, due to the y flag (matches are "glued" to each other) the rest of the pattern fails too.

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • I've simplified the regex that you suggested as `/^\s*\|(?=(?:[^/|]+\|)+$)|(?!^)(?[^|]+)\|/gy` to catch table rows like: `| cell content | cell content |`. See: https://regex101.com/r/5G3qP5/1. But while using the sticky flag, I cannot use `/` char inside the cell content. The above regex does not match text like: `|---|---| |`. Can you please suggest me what is going on? – Kiran Parajuli Jul 14 '22 at 16:33
  • 1
    @KiranParajuli Because you put the `/` into a [negated class](https://www.regular-expressions.info/charclass.html#negated) `[^/|]` [just remove the `/` from it](https://regex101.com/r/6Torke/1). – bobble bubble Jul 14 '22 at 16:43
  • OMG, I was thinking I'm escaping the character `|`. But it was messing the regex up. Thank you @bobblebubble! You're great – Kiran Parajuli Jul 14 '22 at 16:56
  • YW @KiranParajuli :) The good thing about character classes is that you don't need much escaping of the characters inside. Just eg hyphen for matching literal `-` if it's not at start or end of the class (between chars it indicates a range inside a char class) or brackets (not even necessarily but it's a good practice). Also the escape character is a backslash. – bobble bubble Jul 14 '22 at 18:54