5

I have a string of form FOO_123_BAR.bazquux, where FOO and BAR are fixed strings, 123 is a number and bazquux is freeform text.

I need to perform a text transformation on this string: extract 123 and bazquux, increment the number and then arrange them in a different string.
For example, FOO_123_BAR.bazquuxFOO=124 BAR=bazquux. (Actual transformation is more complex.)

Naturally, I can do this in a sequence of sed and expr calls, but it's ugly:

shopt -s lastpipe

in=FOO_123_BAR.bazquux
echo "$in" | sed -r 's|^FOO_([0-9]+)_BAR\.(.+)$|\1 \2|' | read number text
out="FOO=$((number + 1)) BAR=$text"

Is there a more powerful text processing tool that can do the job in a single invocation? If yes, then how?


Edit: I apologize for not making this clearer, but the exact structure of the input and output is an example. Thus, I prefer general solutions that work with any delimiters or absence thereof, rather than solutions that depend on e. g. presence of underscores.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
intelfx
  • 2,386
  • 1
  • 19
  • 32
  • Do you want a suggestion using bash regex or string manipulations? if not retaining the bash tag doesn't make sense – Inian Aug 06 '20 at 10:49
  • @Inian I'm certainly open to oneliners in pure bash :) – intelfx Aug 06 '20 at 12:15
  • wtrt your edit that `I prefer general solutions that work with any delimiters or absence thereof` - what if you replace `FOO` and `BAR` with different "fixed strings" like `F.O` and `B/R'`? If you would expect the solution you accepted to keep working then you'd be disappointed. You can't get a general solution from 1 input example and an informal description like "fixed strings" and the solution you picked doesn't use "fixed strings" at all, it uses a regular expression because that's good enough to produce the expected output from the one sample input you provided. – Ed Morton Aug 08 '20 at 14:00
  • @EdMorton a regular expression can be trivially adapted to any `FOO` and `BAR`, or even to completely different inputs. The solution that uses awk to break on `_` and `.` as delimiters — can't. I hope this explains my logic in choosing the accepted answer. – intelfx Aug 10 '20 at 02:38
  • It's messy and cumbersome at best to try to make a regexp operate as if it were using strings (see https://stackoverflow.com/q/29613304/1745001) and you don't need either approach, you can just use literal strings, it's just not clear from your question which is the best approach for your problem. It sounds from your comment like a literal strings approach may have been the right approach. – Ed Morton Aug 10 '20 at 12:28

6 Answers6

6

With GNU sed, you can execute the entire replacement string as an external command using the e flag.

$ s='FOO_123_BAR.bazquux'
$ echo "$s" | sed -E 's/^FOO_([0-9]+)_BAR\.(.+)$/echo FOO=$((\1 + 1)) BAR=\2/e'
FOO=124 BAR=bazquux

To avoid conflict with shell metacharacters, you need to quote the unknown portions:

$ s='FOO_123_BAR.$x(1)'
$ echo "$s" | sed -E 's/^FOO_([0-9]+)_BAR\.(.+)$/echo FOO=$((\1 + 1)) BAR=\2/e'
sh: 1: Syntax error: "(" unexpected

$ echo "$s" | sed -E 's/^FOO_([0-9]+)_BAR\.(.+)$/echo FOO=$((\1 + 1)) BAR=\x27\2\x27/e'
FOO=124 BAR=$x(1)
Sundeep
  • 23,246
  • 2
  • 28
  • 103
  • 1
    I did not know about the `e` flag to the `s///` sed command. Thanks, this (alongside with perl) is the cleanest solution (although a GNUism, but I don't care about that here). – intelfx Aug 06 '20 at 12:23
  • 1
    @intelfx cool, see [my tutorial for e flag](https://github.com/learnbyexample/learn_gnused/blob/master/gnu_sed.md#executing-external-commands) for more examples and details... although, I'd say that Ed Morton's take by processing the input as fields is the cleanest for this particular format.. – Sundeep Aug 06 '20 at 13:15
  • I should've clarified this earlier — I do not like solutions that depend on a particular format, because their general utility is lower. – intelfx Aug 06 '20 at 13:53
  • @intelfx **every** text processing solution depends on a particular format, you just have to tell us what the format is and provide sample input/output that we can use to verify our potential solution works for **that** format and not some other similar but invalid format. – Ed Morton Aug 08 '20 at 13:52
5

Using any awk in any shell on every UNIX box and assuming none of your substrings contain _ or .:

$ s='FOO_123_BAR.bazquux'
$ echo "$s" | awk -F'[_.]' '{print $1"="$2+1,$3"="$4}'
FOO=124 BAR=bazquux
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
4

You may do it with perl:

perl -pe 's|^FOO_([0-9]+)_BAR\.(.+)$|"FOO=" . ($1 + 1) . " BAR=" . $2|e' <<< "$in"

See the online demo

The ($1 + 1) will increment the number captured in Group 2.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
3

Could you please try following, written and tested with shown samples in GNU awk.

1st solution: Adding solution with match function awk.

echo "FOO_123_BAR.bazquux" | 
awk '
match($0,/FOO_[0-9]+_BAR/){
  split(substr($0,RSTART,RLENGTH),array,"_")
  print array[1]"="array[2]+1,array[3] "=" substr($0,RSTART+RLENGTH+1)
}'


2nd solution:

echo "FOO_123_BAR.bazquux" | 
awk '
BEGIN{
  FS="_"
}
{
  $2+=1
  sub(/_/,"=")
  sub(/_/," ")
  sub(/\./,"=")
}
1'
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
2

A pure bash one-liner would be

[[ $s =~ FOO_([0-9]+)_BAR\.(.*) ]] && echo "FOO=$((BASH_REMATCH[1] + 1)) BAR=${BASH_REMATCH[2]}"

assuming the variable s is set to the string that is being parsed before calling that line (s=FOO_123_BAR.bazquux).

M. Nejat Aydin
  • 9,597
  • 1
  • 7
  • 17
1

Using var substitution:

in=FOO_123_BAR.bazquux
raw=(${in//_/ })
$ echo "$raw=$[raw[1]+1] ${raw[2]//./=}"
FOO=124 BAR=bazquux
Ivan
  • 6,188
  • 1
  • 16
  • 23