-1

I want to change the value of specific cell in CSV using the bash command line.

I have a sellers.csv:

seller_id,seller_zip_code_prefix,seller_city,seller_state
3442f8959a84dea7ee197c632cb2df15,13023,campinas,SP
723a46b89fd5c3ed78ccdf039e33ac63,93310,novo hamburgo, rio grande do sul, brasil,RS

As you can see, row 3 column 3 (seller_city) is violating the rule because it contains commas. That's why MySQL is saying "Row 3 contained more data than there were input columns". I want to change novo hamburgo, rio grande do sul, brasil into novo hamburgo rio grande do sul brasil. I tried awk but its saying I am supplying wrong arguments.

awk -v r=553 -v c=3 -v val="novo hamburgo - rio grande do sul - brasil" -F sellers.csv

awk: option requires an argument -- F

Cyrus
  • 84,225
  • 14
  • 89
  • 153

3 Answers3

2

You don't need to replace the commas, just quote your fields:

$ awk -F',' -v OFS='","' '{city=$0; sub(/([^,]*,){2}/,"",city); sub(/,[^,]*$/,"",city); print "\"" $1, $2, city, $NF "\""}' sellers.csv
"seller_id","seller_zip_code_prefix","seller_city","seller_state"
"3442f8959a84dea7ee197c632cb2df15","13023","campinas","SP"
"723a46b89fd5c3ed78ccdf039e33ac63","93310","novo hamburgo, rio grande do sul, brasil","RS"

But if you really don't want to do that you could do:

$ awk 'BEGIN{FS=OFS=","} {city=$0; sub(/([^,]*,){2}/,"",city); sub(/,[^,]*$/,"",city); gsub(/ *, */," - ",city); print $1, $2, city, $NF}' sellers.csv
seller_id,seller_zip_code_prefix,seller_city,seller_state
3442f8959a84dea7ee197c632cb2df15,13023,campinas,SP
723a46b89fd5c3ed78ccdf039e33ac63,93310,novo hamburgo - rio grande do sul - brasil,RS

For anything else, see What's the most robust way to efficiently parse CSV using awk?.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Thank you for the answer! Initially, I removed the quotes. The standard CSV place strings in " quotes in order to avoid internal ',' . That's why MySQL was reading some integer values in CSV as string. To solve this problem, I had to remove quotes from CSV. I did this using bash command: awk '{gsub(/\"/,"")};1' input.csv But it caused to another error. In row 553, there is a value for seller_city as `novo hamburgo, rio grande do sul, brasil`. That's why MySQL gave Error: "Row 553 contained more data than there were input columns". Now I should fix this problem. I'll try your advice – Abdu Malikov Nov 07 '20 at 17:33
0

The simplest option is to produce the csv file with | instead of , . This solution you avoid this problem.

The standard csv place strings in " quotes in order to avoid internal , .

See this question.

Dudi Boy
  • 4,551
  • 1
  • 15
  • 30
  • you are absolutely right: CSV place strings in " quotes. That's why MySQL was reading some integer values in CSV as a string. To solve this problem, I had to remove quotes from CSV. I did this using bash command: awk '{gsub(/\"/,"")};1' input.csv But it caused to another error. In row 553, there is a value for seller_city as `novo hamburgo, rio grande do sul, brasil`. That's why MySQL gave Error: "Row 553 contained more data than there were input columns". To fix this problem, I had to search for the solution of replacing commas with empty string. – Abdu Malikov Nov 07 '20 at 17:31
0

This might work for you (GNU sed):

sed ':a;s/,/&/4;T;s// -/3;ta' file

If the current line has four ,'s replace the third by - and repeat until failure.

potong
  • 55,640
  • 6
  • 51
  • 83