0

I'm having trouble ignoring the commas within a bash variable so that the csv file doesn't split up the variable into different columns. If the variable were "TACGTAT,TACG", I would want that as a single column instead of two different columns.

Here is my full script:

for filename in "$1"/*.vcf; do
  bcftools query -f '%POS %REF %ALT\n' "$filename" > temp_reads.txt
  echo "Sick Read!: "$(cat temp_reads.txt)""
  echo ""$(basename "$filename")","$(cat temp_reads.txt)"" >> output.csv
done

And I specifically want everything in the "$(cat temp_reads.txt)" expansion to be included as a single column in the csv file in case there happened to be a comma in there.

Thanks!

Kenny Workman
  • 35
  • 1
  • 6
  • 1
    Possible duplicate of [How to ignore commas within a CSV file being read by bash script](https://stackoverflow.com/questions/44522929/how-to-ignore-commas-within-a-csv-file-being-read-by-bash-script) – R4444 Apr 02 '19 at 19:41
  • 1
    Include input and output samples in your question. – oguz ismail Apr 02 '19 at 19:41
  • 3
    Your quoting style negates most benefits of quoting; `""` is removed completely by the shell. `echo ""$(cmd)","$(cmd)""` is the same as `echo $(cmd),$(cmd)`; you probably want `echo "$(cmd),$(cmd)"` instead. – Benjamin W. Apr 02 '19 at 19:52
  • 3
    As @BenjaminW. mentioned your `""` is getting removed by the shell, but what I think you want is `echo "\"$(basename "$filename")\",\"$(cat temp_reads.txt)\""`. Escaping the inner `"` like this will cause the shell to ignore them when doing shell expansion and echo will print them. This handles the case when you may have commas in the contents of temp_reads.txt, but if you have any `"` in there you're going to have to do some extra work. – woolfie Apr 02 '19 at 19:59
  • @BenjaminW. why is this? could you explain how the shell ignores that a little more? – Kenny Workman Apr 02 '19 at 20:15
  • @woolfie I have been trying to figure out the backslash syntax with regard to ignoring quotes for a little while. Could you try to explain this? – Kenny Workman Apr 02 '19 at 20:16
  • 1
    @woolfie has it right; I think you want a double quote within a double quoted string, but you can't do `echo """` – you need to escape, `echo "\""`. – Benjamin W. Apr 02 '19 at 20:51
  • 1
    @KennyWorkman One of the best places to read about this is the [Quoting section of the bash manpage](https://www.gnu.org/software/bash/manual/bash.html#Quoting). "A non-quoted backslash (\) is the escape character. It preserves the literal value of the next character that follows, with the exception of .". Bash escapes a double quote by either **\"** or placing it in single quotes **' " '**. The slight complication here is that [sec 2.7 of the CSV standard](https://tools.ietf.org/html/rfc4180#page-2) states an intentional double quote must be escaped by an additional double quote. – woolfie Apr 03 '19 at 15:31
  • Thanks @woolfie! Your solution worked for me. Just so I understand correctly, using single quotes to escape double quotes would disallow any variable substitution within the quotes, whereas the backslash still allows such substitution? – Kenny Workman Apr 03 '19 at 17:30
  • Maybe see [When to wrap quotes around a shell variable?](https://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-shell-variable) – tripleee Apr 03 '19 at 17:53

2 Answers2

1

The wrangling of temporary files is completely unnecessary anyway.

for filename in "$1"/*.vcf; do
  bcftools query -f '%POS %REF %ALT\n' "$filename" |
  sed "s/^/$(basename "$filename"),/"
done >output.csv

Generally speaking, you cannot nest double quotes: ""foo is just an unquoted foo with an empty quoted string to its left (which of course disappears entirely by the time the shell is done parsing this expression).

Notice also how moving the redirection after the done improves legibility and efficiency. Because you only redirect once, you can write instead of append (assuming you don't need to append for other reasons, of course) and you don't open, seek to the end of the file, write, and close every time through the loop, so you save a fair bit on the I/O overhead.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • I'm brand new to scripting. Would you say that if I want to use a temporary file, I"m probably wrong? – Kenny Workman Apr 03 '19 at 17:43
  • 1
    You generally want to avoid temporary files if you can, yes. If you can't, you need to make sure that every temporary file has a unique name, otherwise two instances of your program cannot run at the same time; and in the worst case, the predictable file name could be the vector for a security problem. – tripleee Apr 03 '19 at 17:52
  • 1
    ... And of course, hitting the disk when you don't have to makes things slower unnecessarily; recall that writing a byte to disk and reading it back can easily be thousands of times slower than writing to and reading from memory. – tripleee Apr 03 '19 at 17:59
  • @ tripleee the one problem I'm running into is that when I wrote to a file, it was pretty easy to export the multiple line contents of that file to a single cell of the csv file because the entire thing could be wrapped with commas relatively easily. Do you see a work around using the stream editor as the current method will put each line of the outputted stream in a separate cell – Kenny Workman Apr 12 '19 at 21:26
  • Without seeing your data I can't, but if you can specify what format the spreadsheet expects, I'm sure a simple script can be written to produce that. Maybe `sed "s/.*/$(basename "$filename"),\"&\"/"` if I'm guessing correctly what you mean? – tripleee Apr 13 '19 at 05:27
  • (Importing perfectly good text data back into a spreadsheet sounds like a step in the wrong direction, though.) – tripleee Apr 13 '19 at 05:30
0

Testdata:

echo 'a,b and "c".' > temp_reads.txt
filename="What a lovely name"

Incorrect solution ignoring double quotes in the file

printf '"%s","%s"\n' "${filename}" "$(cat temp_reads.txt)"

Escaping each double quote with a second one in file

sed 's/"/""/g' temp_reads.txt

Combined

printf '"%s","%s"\n' "${filename}" "$(sed 's/"/""/g' temp_reads.txt)"
Walter A
  • 19,067
  • 2
  • 23
  • 43