1

I want to match on all the words between the | and the end of the row (VEHICLE and all the Vechicle names). I don't want the extra lines matched

I've started with this regex but this wouldn't exclude the symbol |:

^\|.*$

So then I tried this syntax, but it leaves off the last word on the line:

(?<=\|)(.*)(?=\|)

Samples:

Text above I don't want matched

| VEHICLE | Truck | | Bike
| VEHICLE | Car          | | Scooter
| VEHICLE | Sedan | Mini Van    | 
| VEHICLE | Sedan | white, brown, black |     
| VEHICLE | Sedan | pack/cars   | 

Text below that I don't want matched

Tommy Wu
  • 61
  • 9

4 Answers4

1

Repeat any character but a | in the middle, and in the lookahead, alternate between the | and the end of the line:

(?<=\|)([^\|]+)(?=\||$)

https://regex101.com/r/YmtEPE/1

Note that there probably isn't any need for the capturing group in the middle, it's equivalent to the entire match anyway.

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
1

You may use

[^|\s](?:[^|]*[^|\s])?

See the regex demo

Details

  • [^|\s] - any char but | and whitespace
  • (?:[^|]*[^|\s])? - an optional sequence of any 0+ chars other than | followed with any char but | and whitespace.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Your solution works great for this section of text but I have regular text in other rows in the same file. How would I modify the text you provided above to include ONLY lines that start with `|`?> – Tommy Wu Nov 30 '18 at 21:54
  • @TommyWu You probably might want `(?:\G(?!\A)\h*\|\h*|^\|\h*)\K(?:[^|\s](?:[^|]*[^|\s])?)?`, see [demo](https://regex101.com/r/e2Cuit/2). – Wiktor Stribiżew Dec 01 '18 at 15:22
1

You can go with:

(?<=\| *)[^|\s]+(?= *\||$)

Details:

  • [^|\s]+ - matches all characters except white-spaces and | 1+ times.
  • $ - matches end of a string (or a line when m flag is enabled).

Demo here

Update:

(?<=\| *)(?! +)[^|]+(?= *\||$)

vrintle
  • 5,501
  • 2
  • 16
  • 46
  • Your syntax seemed to work great but I noticed 2 issues: 1) It doesn't allow tab whitespace after any of the words 2) Looks like Sublime's Syntax highlighter (YAML) won't accept it. Any ideas? – Tommy Wu Nov 30 '18 at 22:29
1

To match only the words, you could first match a pipe and a whitespace character. Then capture in a group one or more word characters followed by using a positive lookahead to check what is on the right side is either a whitespace character or the end of the string.

Your values are in the first capturing group.

\|\s(\w+)(?=\s\||$)

Explanation

  • \|\s Match | and a whitespace character
  • (\w+) Capture in a group 1+ whitespace characters
  • (?=\s\||$) Positive lookahead to assert what follows is either a whitespace character of the end of the string

Regex demo

Update:

To only match the words in Sublime including matching 1+ tabs you could use:

\|\h+\K\w+(?:[,\/]?\h*\w+)*(?=\h*(?:\||$))

Regex demo

Explanation

  • \|\h+ Match a pipe followed by 1+ horizontal whitespace characters
  • \K Forget what was currently matched
  • \w+ Match 1+ word characters
  • (?: Non capturing group
    • [,\/]?\h*\w+ Match an optional comma or a forward slash followed by 0+ times a horizontal whitespace character and 1+ word characters
  • )* Close non capturing group and repeat 0+ times
  • (?= Positive lookahead to assert what follows is
    • \h* Match 0+ horizontal whitespace characters
    • (?:\||$) Match either a pipe or the end of the string
  • ) Close positive lookahead
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • The answer almost works but has 2 issues: 1. It doesn't capture words that are followed by tabs 2. It includes the `|`, which I want to exclude since I'm using this in Sublime syntax file – Tommy Wu Nov 30 '18 at 22:42
  • Apologies, realized another complication in my sample data that I should have called out. Sometimes the words to be captured are more than 1 word and/or are a comma separated list: a) Mini Van b) car, van, truck c) pack/cars How do I modify the \w+ line to capture these use cases – Tommy Wu Nov 30 '18 at 23:23
  • @TommyWu Do you mean like this? `\|\h+\K\w+(?:[,\/]?\h*\w+)*(?=\h*(?:\||$))` [See demo](https://regex101.com/r/Pbnhwb/1) – The fourth bird Nov 30 '18 at 23:27
  • that's exactly right, thank you! Now I just have to figure out why it's not working in my Sublime syntax highlighter. – Tommy Wu Nov 30 '18 at 23:38
  • @TommyWu When you use find and enable the regex, do you see your matches? – The fourth bird Nov 30 '18 at 23:46
  • while I see matches when I test out my data with regex101.com I do not see any matches in Sublime. Oddly, Sublime will save the changes I made to the syntax file, which usually is an indication it accepts the syntax. – Tommy Wu Nov 30 '18 at 23:49
  • @TommyWu That means that the regex works but you don't see the matches? I see the matches when I run this on ubuntu with Sublime version 3.1.1 and version 2.0.2 – The fourth bird Dec 01 '18 at 00:03
  • 1
    Sorry I misunderstood what you said. Yes, Sublime Search with regex will find the correct matches. The syntax file (YAML format) does not seem to find any matches – Tommy Wu Dec 01 '18 at 00:04
  • @TommyWu I see you asked a [new question](https://stackoverflow.com/questions/53567446/sublime-custom-syntax-highlighter-not-matching-the-same-as-search-with-regex) and when I read [this page](https://stackoverflow.com/questions/35343933/type-of-regex-used-by-sublime-text-3), I think the `\h` is not supported. You could use a character class which will match a space or a tab like `[ \t]` For example `\|[ \t]+\K\w+[^|[ \t]]+(?:[,\/]?[ \t]*\w+)*(?=[ \t]*(?:\||$))` [Regex demo](http://rubular.com/r/aN3QRRkWFj) – The fourth bird Dec 01 '18 at 09:58