0

I have a CSV file that I'm parsing in bash that, for one column, has more than one value for some rows. For example, a line with multiple values may look like

name,12,120,east,"sw1,sw2,sw3"

But not all rows have it. Some may look like

name,10,141,west,sw5534a

What I'm trying to do is if that column has quotes in it to remove them and set the variable to just sw1,sw2,sw3

Relevant parts of the script:

#!/bin/bash
INPUT=file.csv
OLDIFS=$IFS
IFS=,
[ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; }
while read name building id region parents
do
echo "
....snip....
parents $parents"

The output I want for $parents should be sw1,sw2,sw3 but right now it spits out as "sw1,sw2,sw3" I've tried messing around with regex matching in a conditional if the column has a comma to remove the first and last two characters, but I couldn't get it to work. Either it would remove the first s and the last 3 or it would just error out.

Any suggestions appreciated!

Ross
  • 173
  • 1
  • 2
  • 12
  • can you not just do: sed 's/"//g' file.csv – Kevin Jan 02 '15 at 04:53
  • Can any other fields in the file be quoted like this? – Barmar Jan 02 '15 at 04:56
  • @Barmar: I disagree with your marking this question as a duplicate. That other question is about awk, not bash. – Steve Vinoski Jan 02 '15 at 05:05
  • It's about using `awk` to parse a CSV file in a `bash` script. I don't think you can do this properly with just `bash`. – Barmar Jan 02 '15 at 05:06
  • Side issue: you can set IFS locally for the `read` statement. That avoids having to save and restore it, and also avoids any interactions it might have inside the body of the while loop: `while IFS=, read name building id region parents`. Also, you practically always want to supply the `-r` flag to `read`. – rici Jan 02 '15 at 05:52

2 Answers2

1

You can remove both instances of the " character in the $parents variable with substring replacement:

echo ${parents//\"/}

This replaces all " characters with the empty string.

Steve Vinoski
  • 19,847
  • 3
  • 31
  • 46
  • How does that help? The idea is to treat that as one field because it's quoted. – Barmar Jan 02 '15 at 04:54
  • Works great! Thanks heaps! (Won't let me accept your answer for another 3 minutes, but I will at that point) – Ross Jan 02 '15 at 04:56
  • @Barmar: read the question. He wants the double quote characters in the field value removed using bash constructs. My code does exactly that. – Steve Vinoski Jan 02 '15 at 05:04
  • Sorry, misread, I thought you were removing them from the line before reading it. – Barmar Jan 02 '15 at 05:05
  • Although if there are any other fields in the CSV that have commas in them, the `read` statement will put the wrong fields in `parents`. – Barmar Jan 02 '15 at 05:05
  • And as always, if the string contains runs of whitespace or shell wildcards, not quoting it properly will mangle those. You want `echo "${parents//\"/}"` with double quotes, and basically always use double quotes [unless you know what you are doing](http://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-variable). – tripleee Jan 02 '15 at 05:47
1
parents="${parents#\"}"
parents="${parents%\"}"

This will remove the first character if a quote, and the last character if a quote. If they are not a quote, they will be left untouched.

Amadan
  • 191,408
  • 23
  • 240
  • 301