0

Fields in my file awk-test are enclosed in brackets and delimited with semicolon:

"col1";"col2";"col3";"col4";"col5";
"eiusmod";"tempor";"incididunt";"ut";"labore";
"et";"dolore";"magna";"aliqua";"Ut";
"enim";"ad";"minim";"veniam";"quis";
"ut";"aliquip";"ex";"ea";"commodo";
"nostrud";"exercitation";"ullamco";"laboris";"nisi";

Real data (header line plus three records):

"col1";"col2";"col3";"col4";"col5";
"/absence/lang/#LANG_ID#/.descr.php";"BP2_DESCR";"Dodaj";"Add";"Adicionar";
"/cal/lang/#LANG_ID#/cal_feed.php";"LF_COMM_MSG";"je komentiral ""#EVENT_TITLE#""";"commented on an event ""#EVENT_TITLE#""";"comentado sobre o evento ""#EVENT_TITLE#""";
"/mod/lang/#LANG_ID#/set_events.php";"IM_NOTIFY";"Pozdravljeni #USER_NAME#!

#FROM_USER# vam je poslal(a) sporocilo.

------------------------------------------

#FROM_USER#: #MESSAGE#

------------------------------------------;"Hello #USER_NAME#!

You have a new notification from #FROM_USER#

------------------------------------------

#MESSAGE#

------------------------------------------;"Olá #USER_NAME#!

Você tem uma nova notificação de #FROM_USER# 

------------------------------------------

 #MESSAGE# 

------------------------------------------;

I know how to print first 30 lines of column 3 and 4 if column 3 has character "m":

gawk 'BEGIN {FS = ";" } ; $3 ~/m/ {print $3 ";" $4} NR==30{exit}' OFS=';' awk-test 

The result is:

"magna";"aliqua"
"minim";"veniam"
"ullamco";"laboris"

But I don't know (a) how to replace "m" with "x" on a test 30 lines sample (b) how to replace "m" with "x" on a real 250.000 lines file.

Desired output on test-awk:

"xagna";"aliqua"
"xinim";"veniam"
"ullaxco";"laboris"

In reality I need to fix the errors on characters in column 3 only. Therefore I would like to know how to write the changed lines and keep the unchanged ones into a new file that will contain fixed column 3?

Thank you in advance!

andrej
  • 321
  • 1
  • 4
  • 13
  • Can your fields contain semi-colons or newlines? If the answer is "no" then why are you enclosing them in quotes? – Ed Morton Oct 13 '17 at 17:42
  • This file is not mine, it is a CSV output file I have to deal with. Each field is enclosed in quotes and delimited by semicolon. Semicolon is also on the end of each line. Yes, fields contain also semicolons and newlines. – andrej Oct 13 '17 at 18:28
  • Then the answer you accepted won't work for you as it assumes you have neither of those situations. [edit] your question to show some truly representative sample input and output, including semicolons and newlines within fields, if you'd like help. – Ed Morton Oct 13 '17 at 18:30
  • 1
    I see in addition to semicolons and newlines your fields can also include escaped quotes. Given your updated input you need the solution posted at https://stackoverflow.com/q/45420535/1745001. Just use `;` instead of `,` in it. – Ed Morton Oct 13 '17 at 19:11
  • Thank you for your instructions Ed, but I am quite lost as I am an absolute beginner. Could you explain the code: awk -v FPAT='[^,]*|"[^"]+"' '{for (i=1; i<=NF;i++) print i, "<" $i ">"}' and could you answer how can I write the changes as well as unchanged records to new file? – andrej Oct 13 '17 at 19:26
  • 1
    That script doesn't apply to you - you need the bigger one under it that doesn't use FPAT. `print` will produce output so `awk '{ ... print... }' input > output`. If you do want to learn about FPAT though, see https://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content. – Ed Morton Oct 13 '17 at 19:39
  • I will study FPAT. Do I understand correctly: 'awk statement inputFile > outputFile' ? And how to ensure that **all records** are written to the new file, the changed ones as well as the ones that were not altered? – andrej Oct 13 '17 at 19:52
  • There's nothing special about changed vs unchanged records. When a `print` executes it outputs the current record whether it's been changed or not. Not sure I understand your question, sorry. – Ed Morton Oct 13 '17 at 19:57
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/156701/discussion-between-andrej-and-ed-morton). – andrej Oct 13 '17 at 19:59

1 Answers1

1

awk solution:

$ cat tst.awk
BEGIN{FS=OFS=";"}
NR>1 && sub(/m/,"x",$3){print $3, $4}

This will work on your real 250.000 lines file:

$ awk -f tst.awk file
"xagna";"aliqua"
"xinim";"veniam"
"ullaxco";"laboris

or, with a one-liner:

awk 'BEGIN{FS=OFS=";"} NR>1 && sub(/m/,"x",$3){print $3, $4}' file
Marc Lambrichs
  • 2,864
  • 2
  • 13
  • 14