How to update a multichracter delimited field in a file in bash?

Question

I'm trying to match a certain field and update its data from a file delimited with multiple characters. I'm using this to create an imitation of SQL's UPDATE. This is part of a bigger project to create a mini DBMS with bash.

What I tried:

sed "s/\^\_\^/:/g" $file_path | cut -d: -f1 | grep -nw SAR | sed "s/\^\_\^/:/g" | cut -d: -f2 | sed -i "s/$match/$update/g"

My issue is I am unable to use sed -i to update only these specific columns found as you can't pipe into it.

The delimiter being used is : ^_^

Example of data file:

'EGP'^_^'Egypt'
'SAR'^_^'Europe'
'SAR'^_^'Europe'
'SAR'^_^'Europe'
'SAR'^_^'Europe'
'Europe'^_^'SAR'
'SAR'^_^'Europe'
'MYR'^_^'Malaysia'
'MYR'^_^'Malasia'

my $match can be SAR for example and $update would be USD

Expected change to data file

'EGP'^_^'Egypt'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'Europe'^_^'SAR'
'USD'^_^'Europe'
'MYR'^_^'Malaysia'
'MYR'^_^'Malasia'

If there is a different approach that is better, that is also welcome as I am fairly new to bash scripting.

If you have fields `awk` is probably a better option than `sed`. See the `awk` `FS` variable (Field Separator) in the `awk` documentation. — Renaud Pacalet, Jan 06 '22 at 15:18

score 4 · Accepted Answer · edited Jan 06 '22 at 17:47

4

Better to use awk here:

awk -v s="'SAR'" -v q="'USD'" -F'\\^_\\^' -v OFS='^_^' '$1==s {$1=q} 1' file

'EGP'^_^'Egypt'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'Europe'^_^'SAR'
'USD'^_^'Europe'
'MYR'^_^'Malaysia'
'MYR'^_^'Malasia'

Explanation:

-v s="'SAR'" sets command line variable s='SAR'
-v q="'USD'" sets command line variable s='USD'
-F '\\^_\\^' sets input field separator as ^_^. We need to escape ^ as that is a special regex meta character and we need to double escape it because we're using it in a string that is later converted to a regexp (field separator).
-v OFS='^_^' sets output field separator to ^_^
$1 == s compares first field to 'SAR'
$1 = q sets $1 to variable 'USD'

edited Jan 06 '22 at 17:47

Ed Morton

188,023
17
78
185

answered Jan 06 '22 at 15:20

anubhava

761,203
64
569
643

1

A bit simpler, maybe: `awk -v s="'SAR'" -v q="'USD'" -F'\\^_\\^' '$1==s {$1=q} 1' file`. – Renaud Pacalet Jan 06 '22 at 15:44
Thanks, using single quotes around `-F` reduces escaping. However we still need to set `OFS` – anubhava Jan 06 '22 at 15:46
1

Ooops, yes, I forgot it, sorry. – Renaud Pacalet Jan 06 '22 at 15:47
Quite elegant, thank you very much! One last question though as I am not that familliar with awk. What does that trailing 1 at the end of the awk script do? – M.Elsayed Jan 06 '22 at 16:04
1

Trailing `1` (or any non-zero value) prints each line of input. – anubhava Jan 06 '22 at 16:17

The fourth bird · Answer 2 · 2022-01-06T16:36:00.743

2

Using sed you could use $match at the start of the string ^, and change it with $update:

match="'SAR'"
update="'USD'"
sed "s/^$match\^_\^/$update^_^/" file

Output

'EGP'^_^'Egypt'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'Europe'^_^'SAR'
'USD'^_^'Europe'
'MYR'^_^'Malaysia'
'MYR'^_^'Malasia'

See a bash demo.

edited Jan 06 '22 at 16:36

answered Jan 06 '22 at 15:34

The fourth bird

154,723
16
55
70

2

A bit simpler, maybe: `match="'SAR'"; update="'USD'"; sed "s/^$match\\^_\\^/$update^_^/" file`. – Renaud Pacalet Jan 06 '22 at 15:45
@RenaudPacalet Using the single quotes in the variables you mean right? That would make it more readable indeed. – The fourth bird Jan 06 '22 at 15:47
This definitely works, only question is what happens if I want to update the second field? The ^ only matches the start of a given pattern. – M.Elsayed Jan 06 '22 at 15:57
@Stoon If you want to match the nth match, you can use a quantifier and a capture group to skip n fields before matching what you want to replace. It would be like `sed "s/^$\('[^']*'\^_\^$\{2\}\)$match\^_\^/\1$update^_^/g" file` See [this example](https://tio.run/##S0oszvj/PzexJDnDVkk92DFIXYmrtCAlsSQVyA0NdgFyi1NTFJSK9eNiNGI01KPj1GO11GPi4mPiYjRjqo1iamM0VcDawWL6MYYqEO1x8XH66UoKNjY2ChBzgQLqrqVF@QWpYCZMSOn/fwA) but in that case awk would be a better fit. – The fourth bird Jan 06 '22 at 16:37
1

I see, didn't realize you can do that with sed. A bit convoluted but works. I guess I'll proceed with awk like you and the accepted answer suggested :) – M.Elsayed Jan 06 '22 at 17:09

score 1 · Answer 3 · answered Jan 07 '22 at 01:32

With your shown samples, please try following awk program.

awk -v matchvalue="'SAR'" -v updatedvalue="'USD'" '
match($0,"^"matchvalue"\\^_\\^"){
  print updatedvalue"^_^" substr($0,RSTART+RLENGTH)
  next
}
1
'  Input_file

Explanation: Creating 2 variables named matchvalue and updatedvalue with 'SAR' and 'USD' values shown by OP in samples. Then in main program using match function to match if a line starts with 'SAR'^_^ then print new value in its place and then print rest of the line, if line doesn't start from 'SAR'^_^ then simply print it.

score 1 · Answer 4 · answered Jan 07 '22 at 03:13

Here's what I ended up doing:

escaped_delm=$(echo $curr_delim | sed 's/[^^\\]/[&]/g; s/\^/\\^/g; s/\\/\\\\/g')
awk -F"$escaped_delm" -v a_col_update=$update_field -v pick=$p_tmp_field -v a_del="$match" -v a_ins="$insert" -v OFS="$curr_delim" '$pick==a_del {$a_col_update=a_ins} 1' $t_path > tmp && mv tmp $t_path

The biggest issue I faced is that I am passing my delimiter ^_^ as a variable to awk. My script assumes that I don't know what the variable is and that it is passed to it at the beginning of the script. This causes an issue, just like the first answer suggested the -F option for awk thinks that ^ is a regex argument.

This is where the sed command I used comes in super handy! It sanitizes the delimiter by escaping any regex meta characters. Would like to thank Ed Morton for the great answer to this question where I got the sed command from.

Hope someone else finds this useful!

My awk variables are:

curr_delim="^_^"
a_col_update=<column number to be updated>
pick=<column used to pick the record/row>
a_del=<field to be updated from picked record/row>
a_ins=<new value for the picked field>
OFS="$curr_delim"

How to update a multichracter delimited field in a file in bash?

4 Answers4