-1

I wanted to use SED to find and replace a small string of text within a number of files.

Specifically the substitution I want to perform is:

sed -e '/35=R/s/|131=.*|/|131=$UNIQUE|/g' $f

Which is running within a bash script where $f is the filename.

The sed searches for lines which contain the string 35=R and then has a very simple expression to replaces |131=.*| (anything after the |131=) with |131=$UNIQUE|.

This seems to work perfectly on some files however in other cases:

Eg working example:

Before:

8=FIX.4.2|9=151|35=R|56=ABC|142=7848|50=STUFF|49=OTHERSTUFF|52=20250905-06:00:10.910|34=107|146=1|55=DE123|22=4|48=DE123|38=1|54=1|207=F|131=12ABC|10=243

After:

8=FIX.4.2|9=151|35=R|56=COBA|142=7848|50=STUFF|49=OTHERSTUFF|52=20250905-06:00:10.910|34=107|146=1|55=DE123|22=4|48=DE123|38=1|54=1|207=F|131=$UNIQUE|10=243

However in other cases it seems to output with large blocks of text missing.

Example not working:

Before:

8=FIX.4.2|9=147|35=R|34=15301|49=STUFF|52=20190905-15:27:54.305|56=OTHERSTUFF|115=STUFFY|131=1234abc|146=1|55=AB123|15=ZYX|22=4|38=1|48=AB123|54=2|207=STUFF|10=253

After:

8=FIX.4.2|9=147|35=R|34=15301|49=STUFF|52=20190905-15:27:54.305|56=OTHERSTUFF|115=STUFFY|131=$UNIQUE|10=253

As you can see its missing everything following the pipe after 131=$UNIQUE. I'm fairly new to expressions and sed so its possible I'm misunderstanding the substitution part. Any pointers would be hugely appreciated.

Thank you.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
Phill
  • 1
  • 1

3 Answers3

1

Replace .* with [^|]* to stop .* before first |.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
1

The .* expression is “greedy”. That means that it will try to catch as many characters as possible. In the examples, it goes to the rightmost | symbol. You should use this expression:

sed -e '/35=R/s/|131=[^|]*|/|131=$UNIQUE|/g' $f
oneastok
  • 323
  • 1
  • 11
0

You were (un)lucky with your first example, because there weren't any | characters after the division with 131= in it.

The problem here is that .* matches any sequence of characters, including any vertical bar (|) characters. So you need to exclude | from what you're matching. So, instead of .* use [^|]*

Also, | can have a special meaning, so you might need to escape it (\|) when it's not in brackets.

But even then, you're not out of the woods. The 131= division can apparently move around on the line. Meaning, it might be first, or it might be last. You can accommodate it being last by just eliminating the closing |:

sed -e '/35=R/s/|131=[^|]*/|131=$UNIQUE/g' $f

(I tested this with Visual Studio search and replace, because it's handy, and sed isn't. But it did what you wanted.)

To take the case where the 131= division might be the first one on the line, you need to add another expression:

sed -e '/35=R/s/|131=[^|]*/|131=$UNIQUE/g' -e '/35=R/s/^131=[^|]*/131=$UNIQUE/g' $f
Spencer
  • 1,924
  • 15
  • 27
  • Sed regexes are Basic REs unless you invoke sed with `-E` (or `-r`, deprecated). Just like grep. So backslashing `|` will make it special. (That's probably not the same as VS search & replace.) – rici Sep 18 '19 at 20:34
  • @rici That's Visual Studio for you. – Spencer Sep 18 '19 at 20:38
  • I think it's more a quirk of sed; `|` meaning alternative is pretty standard these days. But old utilities like sed and grep use BREs, in which you have to write `\|` if you want alternation; `|` is just a regular character. So your answer is incorrect in that point. – rici Sep 18 '19 at 21:17
  • @rici I'm just going to add weasel words if that's OK, to cover all possibilities. – Spencer Sep 18 '19 at 21:40
  • Up to you, but the weasel is still wrong :-) https://tio.run/##LY1LC4JQEIX/TKsg7525M75gFgVKbopMQXtBL9y1kXbz32@m7g7nOx@nf79W3efrvXEspenNRcGBnG96XZoxLupdcagz03kfS140AQWoiQBF@lfUkQA7C0qJHKs6z5VR0EJiE8sr4BSjlClwlpVD2VfbrJxmADwJ7fgD6Oj@eCpQKKDMst4MlQ6jU9soopC6eCAUz4RJUNFG8ytYQXY/ – rici Sep 18 '19 at 22:33