How to match a string not followed by a word using sed

Question

I need to delete all strings consisting of a hyphen followed by a whitespace, but only when the whitespace is not followed by the word "og". Example file:

Kultur- og idrettsavdelinga skapar nyska- pande kunst og utvik- lar samfunnet

I tried negative lookahead :

sed -e 's/- (?!og)//g'

but it doesn't work. What I want is something like this:

Kultur- og idrettsavdelinga skapar nyskapande kunst og utviklar samfunnet.

Any ideas?

AFAIK, `sed` do not have support for lookaheads and lookbehinds.. **[source](http://stackoverflow.com/questions/12176026/whats-wrong-with-my-lookahead-regex-in-gnu-sed)** ..you can use `perl` instead — rock321987, Jul 01 '16 at 15:40

Casimir et Hippolyte · Answer 1 · 2016-07-01T15:47:05.953

1

The lookahead feature isn't available with sed, but you can describe all possibilities:

sed -e 's/\(- \(- \)*\)\([^o]\|$\|o\([^g]\|$\)\)/\3/g'

You can test it with: - - - - og - - oa - o => - og oa o

edited Jul 01 '16 at 15:47

answered Jul 01 '16 at 15:41

Casimir et Hippolyte

88,009
5
94
125

That'll work for `og` ("and" in Norwegian), but he'll probably also need to look for `eller` ("or") as well... – Kusalananda Jul 01 '16 at 15:46
You're right about that! I need to look for "eller" as well. – Arild Noven Jul 01 '16 at 15:54

Jedi · Answer 2 · 2016-07-01T16:03:11.990

1

You can also use a sed chain, first replacing - og with something nonsensical (like booogabooga), then performing the replacement, then reversing the booogabooga.

sed -e 's/- og/booogabooga/g; s/- //g; s/booogabooga/- og/g'

Some versions of sed may need:

sed -e 's/- og/booogabooga/g' -e 's/- //g' -e 's/booogabooga/- og/g'

This can be slower and more painful, especially if you have multiple replacements as @Kusalananda suggests, but it is easier to understand.

edited Jul 01 '16 at 16:03

answered Jul 01 '16 at 15:58

Jedi

3,088
2
28
47

1

The safe, idiomatic approach is to create a string (or strings) that can't exist in the input, not try to guess one. See http://stackoverflow.com/a/38153467/1745001. – Ed Morton Jul 01 '16 at 20:37
1

@EdMorton true. Now that I Google it, there are incredibly [174 results for "booogabooga"](https://www.google.com/search?q=%22booogabooga%22) which I thought I just made up. – Jedi Jul 01 '16 at 20:41
1

Yeah, unfortunately it's not even 100% safe to use `$'\n'` as the temp replacement string in case someones using hold space/buffer/time warp voodoo conjurations to force sed to do multi-line stuff. – Ed Morton Jul 01 '16 at 20:49

Ed Morton · Accepted Answer · 2016-07-01T20:34:25.627

Given this input file (I added - ellers since you said in a comment you need to handle them too):

$ cat file
Kultur- og idrettsavdelinga skapar- eller nyska- pande kunst og utvik- lar- eller samfunnet

here's the common sed idiomatic approach:

$ sed 's/a/aA/g; s/- og/aB/g; s/- eller/aC/g; s/- //g; s/aC/- eller/g; s/aB/- og/g; s/aA/a/g' file
Kultur- og idrettsavdelinga skapar- eller nyskapande kunst og utviklar- eller samfunnet

The above works by turning all as (or whatever other char you like that's not in your target strings) into aA so we can then turn the strings we're interested in, - og and - eller, into a<some other character>, e.g. aB and aC and at that point we know the only occurrences of aB and aC in the input are the newly transformed - og and - eller since all of the existing as are now aA.

Now we can just remove all remaining -s from the file and then convert the aCs back to - eller and aBs back to - ogs and finally all aAs back to the original as.

potong · Answer 4 · 2016-07-03T08:05:27.517

1

This might work for you (GNU sed):

sed -r 's/(- (og|eller))|- /\1/g' file

This relies on alternation to re-replace specific cases and the empty backreference to replace the general case.

edited Jul 03 '16 at 08:05

answered Jul 03 '16 at 07:47

potong

55,640
6
51
83

How to match a string not followed by a word using sed

4 Answers4

Linked