3

I'm trying to match a certain field and update its data from a file delimited with multiple characters. I'm using this to create an imitation of SQL's UPDATE. This is part of a bigger project to create a mini DBMS with bash.

What I tried:

sed "s/\^\_\^/:/g" $file_path | cut -d: -f1 | grep -nw SAR | sed "s/\^\_\^/:/g" | cut -d: -f2 | sed -i "s/$match/$update/g"

My issue is I am unable to use sed -i to update only these specific columns found as you can't pipe into it.

The delimiter being used is : ^_^

Example of data file:

'EGP'^_^'Egypt'
'SAR'^_^'Europe'
'SAR'^_^'Europe'
'SAR'^_^'Europe'
'SAR'^_^'Europe'
'Europe'^_^'SAR'
'SAR'^_^'Europe'
'MYR'^_^'Malaysia'
'MYR'^_^'Malasia'

my $match can be SAR for example and $update would be USD

Expected change to data file

'EGP'^_^'Egypt'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'Europe'^_^'SAR'
'USD'^_^'Europe'
'MYR'^_^'Malaysia'
'MYR'^_^'Malasia'

If there is a different approach that is better, that is also welcome as I am fairly new to bash scripting.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
M.Elsayed
  • 85
  • 1
  • 7
  • 3
    If you have fields `awk` is probably a better option than `sed`. See the `awk` `FS` variable (Field Separator) in the `awk` documentation. – Renaud Pacalet Jan 06 '22 at 15:18

4 Answers4

4

Better to use awk here:

awk -v s="'SAR'" -v q="'USD'" -F'\\^_\\^' -v OFS='^_^' '$1==s {$1=q} 1' file

'EGP'^_^'Egypt'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'Europe'^_^'SAR'
'USD'^_^'Europe'
'MYR'^_^'Malaysia'
'MYR'^_^'Malasia'

Explanation:

  • -v s="'SAR'" sets command line variable s='SAR'
  • -v q="'USD'" sets command line variable s='USD'
  • -F '\\^_\\^' sets input field separator as ^_^. We need to escape ^ as that is a special regex meta character and we need to double escape it because we're using it in a string that is later converted to a regexp (field separator).
  • -v OFS='^_^' sets output field separator to ^_^
  • $1 == s compares first field to 'SAR'
  • $1 = q sets $1 to variable 'USD'
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
anubhava
  • 761,203
  • 64
  • 569
  • 643
2

Using sed you could use $match at the start of the string ^, and change it with $update:

match="'SAR'"
update="'USD'"
sed "s/^$match\^_\^/$update^_^/" file

Output

'EGP'^_^'Egypt'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'Europe'^_^'SAR'
'USD'^_^'Europe'
'MYR'^_^'Malaysia'
'MYR'^_^'Malasia'

See a bash demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 2
    A bit simpler, maybe: `match="'SAR'"; update="'USD'"; sed "s/^$match\\^_\\^/$update^_^/" file`. – Renaud Pacalet Jan 06 '22 at 15:45
  • @RenaudPacalet Using the single quotes in the variables you mean right? That would make it more readable indeed. – The fourth bird Jan 06 '22 at 15:47
  • This definitely works, only question is what happens if I want to update the second field? The ^ only matches the start of a given pattern. – M.Elsayed Jan 06 '22 at 15:57
  • @Stoon If you want to match the nth match, you can use a quantifier and a capture group to skip n fields before matching what you want to replace. It would be like `sed "s/^\(\('[^']*'\^_\^\)\{2\}\)$match\^_\^/\1$update^_^/g" file` See [this example](https://tio.run/##S0oszvj/PzexJDnDVkk92DFIXYmrtCAlsSQVyA0NdgFyi1NTFJSK9eNiNGI01KPj1GO11GPi4mPiYjRjqo1iamM0VcDawWL6MYYqEO1x8XH66UoKNjY2ChBzgQLqrqVF@QWpYCZMSOn/fwA) but in that case awk would be a better fit. – The fourth bird Jan 06 '22 at 16:37
  • 1
    I see, didn't realize you can do that with sed. A bit convoluted but works. I guess I'll proceed with awk like you and the accepted answer suggested :) – M.Elsayed Jan 06 '22 at 17:09
1

With your shown samples, please try following awk program.

awk -v matchvalue="'SAR'" -v updatedvalue="'USD'" '
match($0,"^"matchvalue"\\^_\\^"){
  print updatedvalue"^_^" substr($0,RSTART+RLENGTH)
  next
}
1
'  Input_file

Explanation: Creating 2 variables named matchvalue and updatedvalue with 'SAR' and 'USD' values shown by OP in samples. Then in main program using match function to match if a line starts with 'SAR'^_^ then print new value in its place and then print rest of the line, if line doesn't start from 'SAR'^_^ then simply print it.

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
1

Here's what I ended up doing:

escaped_delm=$(echo $curr_delim | sed 's/[^^\\]/[&]/g; s/\^/\\^/g; s/\\/\\\\/g')
awk -F"$escaped_delm" -v a_col_update=$update_field -v pick=$p_tmp_field -v a_del="$match" -v a_ins="$insert" -v OFS="$curr_delim" '$pick==a_del {$a_col_update=a_ins} 1' $t_path > tmp && mv tmp $t_path

The biggest issue I faced is that I am passing my delimiter ^_^ as a variable to awk. My script assumes that I don't know what the variable is and that it is passed to it at the beginning of the script. This causes an issue, just like the first answer suggested the -F option for awk thinks that ^ is a regex argument.

This is where the sed command I used comes in super handy! It sanitizes the delimiter by escaping any regex meta characters. Would like to thank Ed Morton for the great answer to this question where I got the sed command from.

Hope someone else finds this useful!

My awk variables are:

  1. curr_delim="^_^"
  2. a_col_update=<column number to be updated>
  3. pick=<column used to pick the record/row>
  4. a_del=<field to be updated from picked record/row>
  5. a_ins=<new value for the picked field>
  6. OFS="$curr_delim"
M.Elsayed
  • 85
  • 1
  • 7