combine 2 awk or sed statements into one and save the existing file

Question

My objective is to only modify the line that begins with data, which there is only one of. In this line, I am replacing the 51-80 characters and the 97-126 characters. I would rather have a one liner with sed because then I can use the -i flag and it would then save the file on demand.

Here is the line in the file I am modifying:

data = '{\n"feature_name": "ALL",\n"start_date": "2020-06-07T08:34:00.000-06:00",\n"end_date": "2020-06-08T13:35:00.000-06:00",\n"product": "SAEP",\n"limit":1000\n}'

Here is my two awk statements that are modifying the start_date and end_date portions of the string. Ive included the two variables that I use to get the current time minus 15 minutes ago for the start_date as well as the current time for the end_date:

ctime=`date +%Y-%m-%dT%H:%M:%S.%3N-00:00`
fifteen_min_ago=`date -d "15 mins ago" +%Y-%m-%dT%H:%M:%S.%3N-00:00`

awk '$1 ~ /^data/{printf "%s%*s%s\n",substr($0,1,m-1),n-m,"'$fifteen_min_ago'",substr($0,n)}' m=51 n=80 file
awk '$1 ~ /^data/{printf "%s%*s%s\n",substr($0,1,m-1),n-m,"'$ctime'",substr($0,n)}' m=97 n=126 file

This works beautifully but I need to save the file. If I run two separate awk commands, it only modifies one at a time. So either I combine them and save with awk or will need a sed one-liner that I could then use -i with.

Thanks.

The input looks like a broken json. Can't you get the real json and use `jq` to extract the dates? — choroba, Jun 13 '20 at 16:42
it isn't broken. This variable is what I will be sending off a post to download information from an API. The only thing I dont like about my awk statement is that it seems to print out another set of my variables. Ive never heard of jq. But what I am doing is trying to modify my other python script because Ive spent days trying to work around their ridiculous time functions. I have used the .replace() on data and got the exact string I wanted with the exact time, yet it doesnt post this data. I dont get how an identical string is scrutinized. — Jason Smith, Jun 13 '20 at 16:47
Don't let shell variables (`fifteen_min_ago` and `ctime`) expand to become part of an awk script as that lease to insidious, cryptic errors, see [how-do-i-use-shell-variables-in-an-awk-script](https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script). If you care about `-i` - that's only available in GNU and OSX/BSD sed, GNU awk has `-i inplace`. — Ed Morton, Jun 13 '20 at 18:41

score 1 · Accepted Answer · edited Jun 13 '20 at 19:15

You can inline multi-line awk script, or you can put the two statements in a file. Use distinct variable names for each 'pass'

awk '
$1 ~ /^data/ {
    $0 = sprintf("%s%*s%s\n",substr($0,1,m-1),n-m,"'$fifteen_min_ago'",substr($0,n))
    $0 = sprintf("%s%*s%s\n",substr($0,1,m2-1),n2-m2,"'$ctime'",substr($0,n2))
}
{ print }
' m=51 n=80 m2=97 n2=126 file > file.new &&
mv file.new

Note that there are other (simpler) ways to achieve he replacement that is implemented in the question. This is the most similar to the approach described in the question.

With SED the replacement are easier:

ctime=$(date +%Y-%m-%dT%H:%M:%S.%3N-00:00)
fifteen_min_ago=$(date -d "15 mins ago" +%Y-%m-%dT%H:%M:%S.%3N-00:00)

sed -e 's/"start_date": "[^"]*"/"start_date": "'$ctime'"/' \
    -e 's/"end_date": "[^"]*"/"end_date": "'$fifteen_min_ago'"/' < file > file.new && mv file.new file

Also - with `cmd '"foo"'$var'"bar"'` you're removing the quoting that should be present by default from the shell variables and so inviting the shell to do globbing, word splitting, and filename expansion on them. Never do that unless you have a very specific (and rare!) purpose in mind. If you were going to let a shell variable expand to become part of a script then it'd be `cmd '"foo"'"$var"'"bar"'` and while that may be the best you can do in sed, that's the wrong way to access the value of a shell variable in awk (again, except in very rare specific situations which isn't the case here). — Ed Morton, Jun 13 '20 at 18:55

score 1 · Answer 2 · answered Jun 13 '20 at 19:10

Since you're using GNU date and GNU sed you must have access to GNU awk in which case this is all you need:

awk -i inplace '
BEGIN {
    ctime = strftime("%Y-%m-%dT%H:%M:%S.%3N-00:00")
    fifteen_min_ago = strftime("%Y-%m-%dT%H:%M:%S.%3N-00:00",systime()-(15*60))
}
match($0,/(^\s*data.*"start_date":[^"]*")([^"]+)(.*end_date":[^"]*")([^"]+)(.*)/,a) {
    $0 = a[1] fifteen_min_ago a[3] ctime a[5]
}
{ print }
' file

For example:

$ cat file
data = '{\n"feature_name": "ALL",\n"start_date": "2020-06-07T08:34:00.000-06:00",\n"end_date": "2020-06-08T13:35:00.000-06:00",\n"product": "SAEP",\n"limit":1000\n}'

$ awk -i inplace '
BEGIN {
    ctime = strftime("%Y-%m-%dT%H:%M:%S.%3N-00:00")
    fifteen_min_ago = strftime("%Y-%m-%dT%H:%M:%S.%3N-00:00",systime()-(15*60))
}
match($0,/(^\s*data.*"start_date":[^"]*")([^"]+)(.*end_date":[^"]*")([^"]+)(.*)/,a) {
    $0 = a[1] fifteen_min_ago a[3] ctime a[5]
}
{ print }
' file

$ cat file
data = '{\n"feature_name": "ALL",\n"start_date": "2020-06-13T13:56:46.3N-00:00",\n"end_date": "2020-06-13T14:11:46.3N-00:00",\n"product": "SAEP",\n"limit":1000\n}'

markp-fuso · Answer 3 · 2020-06-13T18:04:44.880

It is possible to use sed in conjunction with the OP's offsets (m and n), though we'll need to do a bit of math to get the correct offsets for use by sed.

For the sake of this example we'll use the following dataset:

$ cat xx
         1         2         3
data 6789012345678901234567890
abcdefghijklmnopqrstuvwxyz

And we'll apply the following replacements:

on a line that starts with ^data ...
replace positions 10-12 with XXX and ...
replace positions 20-25 with YYYYYY

To do this with a single sed invocation we'll need 2x sets of offsets:

m=10 ; n=12
o=20 ; p=25
r1=XXX
r2=YYYYYY

But before we get to the sed script we need to consider:

for the first sed pattern match we want to keep positions 1-9 and 13-EOL, and replace characters starting @ position 10 with a length of 3
for the second sed pattern matche we want to keep positions 1-19 and 26-EOL, and replace characters starting @ position 20 with a length of 6

We can obtain our sed-specific offsets like such:

m2=$((m-4-1))    # ( 10 - length('data') - 1 ) = 5 ; so length(data)=4 +  5 =  9 = end of first set of characters to keep
n2=$((n-m+1))    # ( 12 -             10 + 1 ) = 3
o2=$((o-4-1))    # ( 20 - length('data') - 1 ) = 15; so length(data)=4 + 15 = 19 = end of first set of characters to keep
p2=$((p-o+1))    # ( 25 -             20 + 1 ) = 6

We're now ready to look at the sed solution:

$ set -xv     # echo the `sed` command with all variables substituted with values
$ sed -E "s/(^data.{${m2}}).{${n2}}(.*)$/\1${r1}\2/g; s/(^data.{${o2}}).{${p2}}(.*)$/\1${r2}\2/g" xx

Where:

set -xv - allow us to debug the follow-on sed command; set +xv will turn off
(^data.{${m2}}) - for lines that start with ^data, store the first 9 characters in buffer #1; length of data + m2=5
.{${n2}} - match the next 3 characters; to be replaced with contents of r1
(.*)$ - match the rest of the line and store in buffer #2
\1${r1}\2 - replace the line with buffer #1 + ${r1} + buffer #2
(^data.{${o2}}) - for lines that start with ^data, store the first 19 characters in buffer #1; length of data + o2=15
.{${p2}} - match the next 6 characters; to be replaced with contents of r2
(.*)$ - match the rest of the line and store in buffer #2
\1${r2}\2 - replace the line with buffer #1 + ${r2} + buffer #2

When running the above we should get:

+ sed -E 's/(^data.{5}).{3}(.*)$/\1XXX\2/g; s/(^data.{15}).{6}(.*)$/\1YYYYYY\2/g' xx
         1         2         3
data 6789XXX3456789YYYYYY67890
abcdefghijklmnopqrstuvwxyz

Where:

the line starting with + sed is showing us the actual sed command with all variables plugged into the mix (this is the result of having run set -xv beforehand)
the rest of the output is our input file with the desired string replacements

Last but not least we can write all of this back to the original file by including the -i flag:

$ set +xv      # turn off debugging
$ sed -i -E "s/(^data.{${m2}}).{${n2}}(.*)$/\1${r1}\2/g; s/(^data.{${o2}}).{${p2}}(.*)$/\1${r2}\2/g" xx
$ cat xx
         1         2         3
data 6789XXX3456789YYYYYY67890
abcdefghijklmnopqrstuvwxyz

Im going to have to come back later and try this for the sake of knowing more sed but thanks for this addition! — Jason Smith, Jun 13 '20 at 19:33

markp-fuso · Answer 4 · 2020-06-13T20:09:03.757

Here's an awk solution based on the use of a double quote (") as the delimiter ...

A sample data file:

$ cat yy
data = '{\n"feature_name": "ALL",\n"start_date": "2020-06-07T08:34:00.000-06:00",\n"end_date": "2020-06-08T13:35:00.000-06:00",\n"product": "SAEP",\n"limit":1000\n}'
XXXX = '{\n"feature_name": "ALL",\n"start_date": "2020-06-07T08:34:00.000-06:00",\n"end_date": "2020-06-08T13:35:00.000-06:00",\n"product": "SAEP",\n"limit":1000\n}'

And our replacement strings:

$ ctime=`date +%Y-%m-%dT%H:%M:%S.%3N-00:00`
$ fifteen_min_ago=`date -d "15 mins ago" +%Y-%m-%dT%H:%M:%S.%3N-00:00`
$ echo ${ctime}
2020-06-13T13:16:18.624-00:00
$ echo ${fifteen_min_ago}
2020-06-13T13:01:19.625-00:00

Using the double quote (") as our delimiter we see the following replacements:

field #8 ($8) - replace with ${fifteen_min_ago}
field #12 ($12) - replace with ${ctime}

Pulling this all together:

$ awk -F'"' '                              # set input field separator == double quote
BEGIN    { OFS = FS }                      # set output field separator == input field separator
/^data/  { $8  = "'${fifteen_min_ago}'"    # for lines starting with ^data, replace fields 8
           $12 = "'${ctime}'"              # and 12 with our variables
           print                           # print the current line
           next                            # skip to next line of input
         }
         { print }                         # for all other lines just print the entire line
' yy

Running the above gives us:

data = '{\n"feature_name": "ALL",\n"start_date": "2020-06-13T13:01:19.625-00:00",\n"end_date": "2020-06-13T13:16:18.624-00:00",\n"product": "SAEP",\n"limit":1000\n}'
XXXX = '{\n"feature_name": "ALL",\n"start_date": "2020-06-07T08:34:00.000-06:00",\n"end_date": "2020-06-08T13:35:00.000-06:00",\n"product": "SAEP",\n"limit":1000\n}'

Since awk cannot (easily) overwrite the input file the easiest solution is to write the awk output to a new file and then rename as desired (see dash-o's answer for an example). [EDIT: Per Ed Morton's comment: GNU awk has a -i flag that functions the same as GNU sed -i.]

This is much cleaner than my original one, it makes sense from a programming perspective. Ive used awk in many ways so I should have known about the field delimiter! However no matter what, I noticed that my python script would still complain about the current time so to solve that I simply modified ctime to 1 minute ago. less of a headache. Thanks a lot for the help — Jason Smith, Jun 13 '20 at 19:32

combine 2 awk or sed statements into one and save the existing file

4 Answers4