1

I want to match string1 and anything that appears in the following lines:

['string1','string2','string3']
['string1' , 'string2' , 'string3']
['string1.domain.com' , 'string2.domain.com' , 'string3.domain.com']
['string1.domain.com:8080' , 'string2.domain.com:8080' , 'string3.domain.com:8080']

Until it encounters the following:

string2

So with the right regex in the above 4 cases the results in bold would be matched:

['string1','string2','string3']

['string1' , 'string2' , 'string3']

['string1.domain.com' , 'string2.domain.com' , 'string3.domain.com']

['string1.domain.com:8080' , 'string2.domain.com:8080' , 'string3.domain.com:8080']


I tried using the following thread to solve my issue with https://regex101.com/

The regex I tried is from Question 8020848, but was not successful with matching the string correctly:

((^|\.lpdomain\.com:8080' , ')(string1))+$

But I was not successful in only matching the part I wanted to in this text:

['string1.domain.com:8080' , 'string2.domain.com:8080' , 'string3.domain.com:8080']

The following is what I received using the regex that you suggested

@@ -108,7 +108,7 @@ node stringA, stringB, stringC,stringD inherits default {
   'ssl_certificate_file' => 'test.domain.net_sha2_n.crt',
   'ssl_certificate_key_file'=> 'test.domain.net_sha2.key' }
 },
-    service_upstream_members         => ['string1.domain.com:8080', 'string2.domain.com:8080', 'string3.domain.com:8080', 'string4.domain.com:8080', 'string5.domain.com:8080'],
+    service_upstream_members         => [ 'string2.domain.com:8080', 'string3.domain.com:8080', 'string4.domain.com:8080', 'string5.domain.com:8080'],
 service2_upstream_members      => ['string9:8080','string10:8080'],
 service3_upstream_members  => ['string11.domain.com:8080','string12.domain.com:8080','string13.domain.com:8080'],
 service_name                      => 'test_web_nginx_z1',

As you can see, there is a preceding space that for some reason wasn't removed, even regex101.com demonstrates that all whitespaces are captured in the regex using

'string1[^']*'\s*,\s*

This is what I'm currently using (where server is a variable already defined in the script)

sed -i '' "s/'${server}[^']*'\s*,\s*//"
Community
  • 1
  • 1
ARL
  • 986
  • 2
  • 12
  • 25
  • What about a [`'string1[^']*'`](https://regex101.com/r/eh9Rt8/1)? – Wiktor Stribiżew Nov 06 '16 at 11:09
  • Good, but won't include spaces. There are several combinations that could be: ','string2 OR ' , 'string2 OR ', 'string2 need it to include everything up until the ' immediately before string2 – ARL Nov 06 '16 at 11:11
  • 1
    You mean you need spaces + comma + spaces, too? [`'string1[^']*'\s*,\s*`](https://regex101.com/r/eh9Rt8/3)? – Wiktor Stribiżew Nov 06 '16 at 11:12
  • Thank you, that really helped me. – ARL Nov 06 '16 at 11:31
  • I posted as the answer with explanations. – Wiktor Stribiżew Nov 06 '16 at 11:46
  • @WiktorStribiżew I noticed some unexpected result when actually applying this regex with the sed [ tested using gnu-sed: stable 4.2.2 (bottled) ] It seems that my response is too long for this comment so I have posted more info as an answer (hope that is ok) – ARL Nov 09 '16 at 07:54
  • In sed, I'd rather use `[[:space:]]` instead of `\s`. Also, make sure the pattern is defined in a double quoted string literal. Note that regex101.com does not support POSIX regex syntax used in sed, you cannot say that if a regex at regex101 matches the string you provided, it should work in sed. It is not so. – Wiktor Stribiżew Nov 09 '16 at 08:00
  • BTW, [it works with the data you provided](https://ideone.com/MRpeSp). – Wiktor Stribiżew Nov 09 '16 at 08:07
  • I wonder if the sed on OSX is different from the linux version. The output i pasted is the result of git show HEAD with reflects the diff changes after running the sed command. Not sure why there is an extra space before the " ' ". – ARL Nov 09 '16 at 08:09
  • 1
    Try replacing both `\s` with `[[:space:]]`. Not sure it will help, I am no expert in Mac OSX, but yes, there is a difference as far as I know. – Wiktor Stribiżew Nov 09 '16 at 08:12
  • 1
    That did the job. – ARL Nov 09 '16 at 08:16

2 Answers2

1

This should match what you ask (according to your bold highlights) allowing for an unknown amount of spaces, etc.

(?:…) is a non-capturing group.
…+? is a non-greedy match (as few as possible of x)

(string1.+?)(?:'string2)

(string1.+?)'string2

See example: https://regex101.com/r/lFPSEM/3

tmslnz
  • 1,801
  • 2
  • 15
  • 24
  • The `(?:'string2)` should not be inside a non-capturing group, there is only 1 branch inside the grouping construct here, and it is redundant. – Wiktor Stribiżew Nov 06 '16 at 11:47
  • You are right if you assume a consistent sequence of spaces, commas and single quotes is used. But your example would break as soon as that sequence is missing. I believe my approach reflects the OP's thinking of "from _this_ until _that_" more closely. – tmslnz Nov 06 '16 at 11:51
  • If it is that, I'd close the question as a dupe of http://stackoverflow.com/questions/12736074/regex-matching-between-two-strings-in-python, and the right pattern is `('string1.+?)'string2` – Wiktor Stribiżew Nov 06 '16 at 11:54
  • Yup. A redundant non-capturing group :) – tmslnz Nov 06 '16 at 11:56
  • The [`(string1.+?)'string2`](https://regex101.com/r/lFPSEM/5) will capture `"string1', 'string0.sssss', "` in `['string1', 'string0.sssss', 'string2','string3']` and I doubt it is what is required judging by the provided input. – Wiktor Stribiżew Nov 06 '16 at 12:00
  • "[…] Until it encounters the following `string2`". As long the OP doesn't clarify further, I would refrain from assuming that is the case. Further, the input suspiciously looks like something that would better be parsed by other means than a regex… – tmslnz Nov 06 '16 at 12:02
1

To match a string starting with ' then having string1, then any chars other than ', 0 or more occurrences, and then optional number of whitespaces, a comma and again 0+ whitespaces, you may use

'string1[^']*'\s*,\s*

See the regex demo.

Breakdown:

  • 'string1 - a literal char sequence 'string1
  • [^']* - zero or more (*) characters other than ' (due to the negated character class [^...])
  • ' - an apostrophe
  • \s* - 0+ whitespaces
  • , - a comma
  • \s* - 0+ whitespaces.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563