0

Say I have come up with a regex matching a piece of data; the regex contains 2 sed groups (sub-expressions enclosed in ( and )). Also say that this regex is duplicated 9 times to match a whole line. The problem I am facing is how to delete (in an elegant way) every second match against the regex.

SJU
  • 187
  • 1
  • 7

1 Answers1

1

Let's say you have the following string and want to remove the occurrences of bar:

foo bar foo bar foo bar

You can use the following sed command, note the option g which makes the substitution happen as many times as possible:

sed -r 's/([a-z]+) ([a-z]+)/\1/g' <<< 'foo bar foo bar foo bar'

Output: foo foo foo.

However this would not work with a string where the number of words is not even. I would make the second capturing group optional using the * quantifier to make the above commmand even work with such strings:

sed -r 's/([a-z]+) ([a-z]+)*/\1/g' <<< 'foo bar foo bar foo bar foo'

Output: foo foo foo foo.

hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • Good idea! Out of curiosity, how would you preserve only those matches against regex whose indices follow a more irregular pattern? I.e. how would you preserve matches 1, 3, 7, 15, 31, 63, etc.? – SJU Apr 17 '15 at 16:09
  • Can you give a full example? Including the input and the expected output? – hek2mgl Apr 17 '15 at 16:14
  • Use a `CSV` parser to handle CSV. This not a job for a regex (as you see). `Python`, `Perl` or even `PHP` or whatever all have lib functions to read and parse CSV files. – hek2mgl Apr 17 '15 at 16:22
  • An aside: `[a-z]` can contain the upper case letter `X` (and many more - if you don't know why, google `locale` and `character class`) so `[[:lower:]]` is preferred. – Ed Morton Apr 17 '15 at 16:42
  • 1
    @EdMorton It was just an example in this case, I don't even know the exact input data, but thanks for the hint! Will use `[[:lower:]]` when I *really* want to match lowercase characters. Especially in German (for example) it is important.. – hek2mgl Apr 17 '15 at 16:59