1

Taking a string something like this:

Line 1: Test  TableA
Line 2:    TableA  AWord
Line 3: TableA AWord
Line 4: This.TableA
Line 5: This. TableA Aword

I want to match where these criteria are met:

  1. The word TableA is found
  2. There is no dot anywhere on the same line where TableA is found
  3. There may be any number of spaces or other characters in front of the word TableA
  4. There may be characters after the word TableA

So in the scenario above:

  • Line 1,2 & 3 should all match - but ONLY on the word TableA
  • Line 4 & 5 should NOT match

I'm having some real trouble getting this to work though.

-

This matches on every line except #3 - and matches from the start of the line to the end of TableA

^([^\.].*)(?:TableA)

-

This matches Line 1,2,3 & 5 and for 1 & 2 it matches from the start of the line to the end of TableA

(?!\.).(\s)*(TableA)(?=\s|$)

-

This matches 1,2,3 (closest i've gotten to the right answer) but matches from the start of the line to the end of TableA

^(?!.*\.).*(TableA)

This thread: Regex: Match word not containing Contained a solution that does a very similar thing to what I've managed to output, but again, it matches every character in front of the specific word found.

This is in PowerShell - so i believe PCRE is effectively what it's using(?)

Celador
  • 173
  • 1
  • 11
  • What programming language or regex flavor are you using? Is it okay to match the whole line and capture `TableA` in a group? – 41686d6564 stands w. Palestine Nov 07 '19 at 14:42
  • Try it like this `^[^\r\n.]*(?<!\S)TableA(?!\S)[^\r\n.]*$` https://regex101.com/r/kMTVTU/1 – The fourth bird Nov 07 '19 at 14:54
  • I guess the simplest solution would be `^[^.]*TableA[^.]*$`. – SamWhan Nov 07 '19 at 14:55
  • "Line 1,2 & 3 should all match - but ONLY on the word TableA " - What's the point of matching only TableA rather than the whole line? I'm not implying it's not a good idea, but in general you want that in order to extract variable content (and even then, capturing group do the trick just fine), but since the TableA is a constant I guess that's not the point. I'm asking because that makes your requirement impossible for most regex engines (needs variable-width lookbehind) – Aaron Nov 07 '19 at 15:07
  • Ah, .NET regex which Powershell uses make variable-width lookbehind possible. Still, I think you should explain why you want to match only TableA since a regex that does that will be much harder to understand and probably quite a bit less performant. – Aaron Nov 07 '19 at 15:11
  • This is in Powershell - so PCRE(?) - TableA is used as an example, this is actually an object / table name that we'll be looping through, and we want to then replace just that table, but only in those specific scenarios – Celador Nov 07 '19 at 15:20
  • Then as long as the keyword may only appear a single time in a line, the most maintainable and performant way to do so IMO is to match the whole line, grouping what's before the keyword and what's after the keyword in separate capturing groups, then replace with a backreference to the first groupe, the replacement keyword and a backreference to the second group, e.g. match `^([^.\r\n]*)TableA([^.\r\n]*)$` and replace with `\1NewTableName\2` – Aaron Nov 07 '19 at 15:52

1 Answers1

2

you could exclude newlines as well as a negated character class matching not a dot [^.] will also match a newline.

To match the word TableA you could use lookarounds (?<!\S) and (?!\S) to assert no non whitespace chars around it to prevent matching $TableA$

The value is in the first capturing group.

^[^\r\n.]*(?<!\S)(TableA)(?!\S)[^\r\n.]*$

In parts

  • ^ Start of string
  • [^\r\n.]* Match 0+ times not a . or a newline
  • (?<!\S)TableA(?!\S) Match TableA not surrounded by non whitespace chars
  • [^\r\n.]* Match 0+ times not a . or a newline
  • $ End of string

Regex demo

If you want to use PCRE, you could make use of \K and a positive lookahead:

^[^\r\n.]*\K(?<!\S)\KTableA(?!\S)(?=[^\r\n.]*$)

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Thanks for this - but even in the Demo, it is still matching all characters in front of the word TableA - I need it to only match on TableA (where there's no dot) - and not include any characters before that word – Celador Nov 07 '19 at 16:05
  • 1
    @Celador You could use a capturing group https://regex101.com/r/JLf1Fi/1 and use that in the replacement. Or if it is PCRE use `^[^\r\n.]*\K(?<!\S)\KTableA(?!\S)(?=[^\r\n.]*$)` https://regex101.com/r/h5YAyN/1 – The fourth bird Nov 07 '19 at 16:12
  • If it is .NET use `(?<=^[^\r\n.]*)(?<!\S)TableA(?!\S)(?=[^\r\n.]*$)` [Demo](http://regexstorm.net/tester?p=%28%3f%3c%3d%5e%5b%5e%5cr%5cn.%5d*%29%28%3f%3c!%5cS%29TableA%28%3f!%5cS%29%28%3f%3d%5b%5e%5cr%5cn.%5d*%24%29&i=+Test++TableA&o=m) – The fourth bird Nov 07 '19 at 16:15
  • 1
    Thank you kindly for this, the PCRE version appears to be working - needed to make a very tiny tweak for a scenario i failed to mention in the question, but this appears to be working! – Celador Nov 07 '19 at 16:27