Modify numeric values in strings from stdout

Question

I have the input strings that I want to rewrite and modify some of the numeric values:

Input: src/main.tsx(2,31): error TS2304: Cannot find name 'foo'.

Desired output: src/main.tsx:1:30: error: TS2304: Cannot find name 'foo'.

Note that:

The numeric values have been reduced by 1.
error is dynamic. It can also be warning.
The command will have the input piped to it (tsc | MAGIC_HAPPENS_HERE). If there's an error then this command should pass the error along.

So far I have the following: sed -E "s/^([^(]+)\(([0-9]+),([0-9]+)\): ((warning)|(error)) (.*)/\1:\2:\3: \4: \7/"

This works except for the the numeric manipulation. From what I've read I believe sed isn't the right tool for the job. I looked at awk but hit a wall with the regex capture groups.

I'm using MacOS. The command doesn't need to be portable. I'm happy to install additional tools using brew.

RavinderSingh13 · Accepted Answer · 2020-02-09T17:33:58.037

Could you please try following.

awk '
match($0,/\([^)]*/){
  value=substr($0,RSTART+1,RLENGTH-1)
  num=split(value,array,",")
  for(i=1;i<=num;i++){
    val=(val?val":":"")array[i]-1
  }
  part_2=substr($0,RSTART+RLENGTH+1)
  sub(/error/,"error:",part_2)
  print substr($0,1,RSTART-1) ":" val part_2
  value=part_2=""
}'  Input_file

Output will be as follows.

src/main.tsx:1:30: error TS2304: Cannot find name 'foo'.

Explanation: Adding detailed explanation for above code.

awk '                                          ##Starting awk program here.
match($0,/\([^)]*/){                           ##Using match function to match regex from ( till ) in line.
  value=substr($0,RSTART+1,RLENGTH-1)          ##Creating variable value which has value of sub-string from RSTART+1 to RLENGTH-1.
  num=split(value,array,",")                   ##Using split, to split the value into an array named array.
  for(i=1;i<=num;i++){                         ##Running for loop from i=1 to till value of num(which is length of array).
    val=(val?val":":"")array[i]-1              ##Creating variable val whose value is subtraction of array[i] value with 1 and keep concatenating to its own value.
  }
  part_2=substr($0,RSTART+RLENGTH+1)           ##Creating variable part_2 whose value is rest of line after matched regex.
  sub(/error/,"error:",part_2)                 ##Substituting string error with error: here in rest of the line.
  print substr($0,1,RSTART-1) ":" val part_2   ##Printing sub-string from 1 to till match found, :, val and part_2 variables here.
  value=part_2=""                              ##Nullify variables value and part_2 here.
}'  Input_file                                 ##Mentioning Input_file name here.

Almost there, thanks. It's missing a colon after `error`. Would you be able to add some comments to your answer? In its current form I am not able to understand how it works so cannot edit it. — Benedict Cohen, Feb 09 '20 at 17:24
@BenedictCohen, I have edited the answer as per ask now. Also in above code one change can be made if only 1 error is present in whole line then simply substitution can be made in spite of making `part_2` variable. Explanation writing in progress will let you know in few mins too. — RavinderSingh13, Feb 09 '20 at 17:29
@BenedictCohen, I have added a detailed level of explanation for code, kindly do check and lemme know if that helps you. — RavinderSingh13, Feb 09 '20 at 17:34

Jonathan Leffler · Answer 2 · 2020-02-09T18:05:39.427

I agree that sed is not the appropriate tool. Being old-fashioned (or maybe just unfashionable), I'd use Perl:

$ cat data
src/main.tsx(2,31): error TS2304: Cannot find name 'foo'.
$ perl -p -e 's/^(.*?)\((\d+),(\d+)\): (\w+) /sprintf("%s:%d:%d: %s: ", $1, $2-1, $3-1, $4)/e' data
src/main.tsx:1:30: error: TS2304: Cannot find name 'foo'.
$

The regex lazily matches everything up to "(nn,mmm): " followed by a 'word', capturing the two numbers and what precedes the brackets and the word. It then uses the /e modifier ('evaluate the right side as an expression' — see Regexp Quote-Like Operators) to do the subtractions using sprintf() to format the information. The 'word' will capture error or warning or anything else that is all letters followed by a blank. You could use \S+ in place of \w+ to capture any sequence of non-space characters. I assume the separators are single blanks; you can use \s+ in place of the blanks, if need be, to match any non-empty sequence of white space. (The -p option simply means 'read lines from named files, or standard input if no files are named, do the actions in the -e '…script…' and print the result.)

Tested with 5.18.4 (/usr/bin/perl on macOS Mojave 10.14.6) and 5.30.0.

If you have a process producing errors, then you need to ensure that the errors are sent to the Perl script — that's shell scripting rather than anything else.

tsc 2>&1 |
perl -p -e 's/^(.*?)\((\d+),(\d+)\): (\w+) /sprintf("%s:%d:%d: %s: ", $1, $2-1, $3-1, $4)/e'

If you need the standard output of the command (tsc in the amended question and the shell script fragment above) to go somewhere else, then you need to be careful (see also How to pipe stderr and not stdout), but maybe:

tsc 2>&1 >tsc.out |
perl -p -e 's/^(.*?)\((\d+),(\d+)\): (\w+) /sprintf("%s:%d:%d: %s: ", $1, $2-1, $3-1, $4)/e'

The pipe initially sets the standard output going to Perl; the 2>&1 sends standard error there too; the >tsc.out changes standard output so it goes to the file tsc.out, leaving standard error going to the pipe.

I like this approach because it uses tools that I'm more familiar with (specifically `regex` and `printf`). The downside is that it swallows the error. I've updated the question to include more details. — Benedict Cohen, Feb 09 '20 at 17:59
I'm not sure what you mean by "swallows the error", but the basic Perl relays everything sent to standard input to standard output, modifying only lines that contain the pattern it matches. I've added some notes on the shell scripting needed so Perl can read the errors written on standard error by the previous command in the pipeline. I'm not clear if that is the source of your confusion. — Jonathan Leffler, Feb 09 '20 at 18:07
Sorry, I wasn't clear. By "swallows the error" I meant that the return value of `tsc` is non-zero but the return value of the perl command is 0. This means that the script reports as succeeding when it has actually failed. — Benedict Cohen, Feb 09 '20 at 19:03
In Bash, [`set -o pipefail`](https://www.gnu.org/software/bash/manual/bash.html#The-Set-Builtin) (and not `shopt -s pipefail` as suggested previously). It causes a pipeline to return the non-zero status of the rightmost command that fails (so the pipeline only succeeds if every command succeeds). — Jonathan Leffler, Feb 09 '20 at 20:44

Modify numeric values in strings from stdout

2 Answers2