How to replace character in certain column with awk

Question

Fields in my file awk-test are enclosed in brackets and delimited with semicolon:

"col1";"col2";"col3";"col4";"col5";
"eiusmod";"tempor";"incididunt";"ut";"labore";
"et";"dolore";"magna";"aliqua";"Ut";
"enim";"ad";"minim";"veniam";"quis";
"ut";"aliquip";"ex";"ea";"commodo";
"nostrud";"exercitation";"ullamco";"laboris";"nisi";

Real data (header line plus three records):

"col1";"col2";"col3";"col4";"col5";
"/absence/lang/#LANG_ID#/.descr.php";"BP2_DESCR";"Dodaj";"Add";"Adicionar";
"/cal/lang/#LANG_ID#/cal_feed.php";"LF_COMM_MSG";"je komentiral ""#EVENT_TITLE#""";"commented on an event ""#EVENT_TITLE#""";"comentado sobre o evento ""#EVENT_TITLE#""";
"/mod/lang/#LANG_ID#/set_events.php";"IM_NOTIFY";"Pozdravljeni #USER_NAME#!

#FROM_USER# vam je poslal(a) sporocilo.

------------------------------------------

#FROM_USER#: #MESSAGE#

------------------------------------------;"Hello #USER_NAME#!

You have a new notification from #FROM_USER#

------------------------------------------

#MESSAGE#

------------------------------------------;"OlÃ¡ #USER_NAME#!

VocÃª tem uma nova notificaÃ§Ã£o de #FROM_USER# 

------------------------------------------

 #MESSAGE# 

------------------------------------------;

I know how to print first 30 lines of column 3 and 4 if column 3 has character "m":

gawk 'BEGIN {FS = ";" } ; $3 ~/m/ {print $3 ";" $4} NR==30{exit}' OFS=';' awk-test

The result is:

"magna";"aliqua"
"minim";"veniam"
"ullamco";"laboris"

But I don't know (a) how to replace "m" with "x" on a test 30 lines sample (b) how to replace "m" with "x" on a real 250.000 lines file.

Desired output on test-awk:

"xagna";"aliqua"
"xinim";"veniam"
"ullaxco";"laboris"

In reality I need to fix the errors on characters in column 3 only. Therefore I would like to know how to write the changed lines and keep the unchanged ones into a new file that will contain fixed column 3?

Thank you in advance!

Can your fields contain semi-colons or newlines? If the answer is "no" then why are you enclosing them in quotes? — Ed Morton, Oct 13 '17 at 17:42
This file is not mine, it is a CSV output file I have to deal with. Each field is enclosed in quotes and delimited by semicolon. Semicolon is also on the end of each line. Yes, fields contain also semicolons and newlines. — andrej, Oct 13 '17 at 18:28
Then the answer you accepted won't work for you as it assumes you have neither of those situations. [edit] your question to show some truly representative sample input and output, including semicolons and newlines within fields, if you'd like help. — Ed Morton, Oct 13 '17 at 18:30
I see in addition to semicolons and newlines your fields can also include escaped quotes. Given your updated input you need the solution posted at https://stackoverflow.com/q/45420535/1745001. Just use `;` instead of `,` in it. — Ed Morton, Oct 13 '17 at 19:11
Thank you for your instructions Ed, but I am quite lost as I am an absolute beginner. Could you explain the code: awk -v FPAT='[^,]*|"[^"]+"' '{for (i=1; i<=NF;i++) print i, "<" $i ">"}' and could you answer how can I write the changes as well as unchanged records to new file? — andrej, Oct 13 '17 at 19:26
That script doesn't apply to you - you need the bigger one under it that doesn't use FPAT. `print` will produce output so `awk '{ ... print... }' input > output`. If you do want to learn about FPAT though, see https://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content. — Ed Morton, Oct 13 '17 at 19:39
I will study FPAT. Do I understand correctly: 'awk statement inputFile > outputFile' ? And how to ensure that **all records** are written to the new file, the changed ones as well as the ones that were not altered? — andrej, Oct 13 '17 at 19:52
There's nothing special about changed vs unchanged records. When a `print` executes it outputs the current record whether it's been changed or not. Not sure I understand your question, sorry. — Ed Morton, Oct 13 '17 at 19:57
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/156701/discussion-between-andrej-and-ed-morton). — andrej, Oct 13 '17 at 19:59

Marc Lambrichs · Accepted Answer · 2017-10-13T17:24:11.873

1

awk solution:

$ cat tst.awk
BEGIN{FS=OFS=";"}
NR>1 && sub(/m/,"x",$3){print $3, $4}

This will work on your real 250.000 lines file:

$ awk -f tst.awk file
"xagna";"aliqua"
"xinim";"veniam"
"ullaxco";"laboris

or, with a one-liner:

awk 'BEGIN{FS=OFS=";"} NR>1 && sub(/m/,"x",$3){print $3, $4}' file

edited Oct 13 '17 at 17:24

answered Oct 13 '17 at 17:14

Marc Lambrichs

2,864
2
13
14

no need to check $3 contains m, sub will just fail if it doesn't – 123 Oct 13 '17 at 17:17
Sure. I misread the question myself: lines having no `m` in $3 should not be printed. You can solve this putting the sub as condition in an `if` as well. – Marc Lambrichs Oct 13 '17 at 17:20
1

You can just do `awk 'BEGIN{FS=OFS=";"} NR>1 && sub(/m/,"x",$3){print $3, $4}'` – 123 Oct 13 '17 at 17:22
1

And that's even more concise. +1. – Marc Lambrichs Oct 13 '17 at 17:23

How to replace character in certain column with awk

1 Answers1