3

Hello everyone I'm a beginner in shell coding. In daily basis I need to convert a file's data to another format, I usually do it manually with Text Editor. But I often do mistakes. So I decided to code an easy script who can do the work for me. The file's content like this

/release201209
a1,a2,"a3",a4,a5
b1,b2,"b3",b4,b5
c1,c2,"c3",c4,c5

to this:

a2>a3
b2>b3
c2>c3

The script should ignore the first line and print the second and third values separated by '>'

I'm half way there, and here is my code

#!/bin/bash
#while Loops

i=1
while IFS=\" read t1 t2 t3
do
    test $i -eq 1 && ((i=i+1)) && continue
    echo $t1|cut -d\, -f2 | { tr -d '\n'; echo \>$t2; } 
done < $1 

The problem in my code is that the last line isnt printed unless the file finishes with an empty line \n And I want the echo to be printed inside a new CSV file(I tried to set the standard output to my new file but only the last echo is printed there). Can someone please help me out? Thanks in advance.

woodyboow
  • 41
  • 2
  • 1
    About the last line ending with '\n' this is a common issue with `read`, there are several ways to fix, e.g. `while IFS=\" read t1 t2 t3 || [ -n "$t1" ]` – vdavid Dec 09 '20 at 13:09
  • you mean ignore the first column? – Aalexander Dec 09 '20 at 13:31
  • @Alex I was giving a solution to the issue where the last line will not enter the while loop if such line does not end with `\n` – vdavid Dec 09 '20 at 13:35
  • Also, in order to discard the first line, you don't need to increment a line counter, this is very heavy. You can either read a line and then enter your loop, e.g. `(read t1; while IFS=\" read t1 t2 t3 ... done) < $1` or you can run a program that removes the first line, such as `tail -n +2` or `sed 1d`, e.g. end the loop with `done < <(tail -n +2 $1)` – vdavid Dec 09 '20 at 13:38

3 Answers3

1

Rather than treating the double quotes as a field separator, it seems cleaner to just delete them (assuming that is valid). Eg:

$ < input tr -d '"'  | awk 'NR>1{print $2,$3}' FS=, OFS=\>
a2>a3
b2>b3
c2>c3

If you cannot just strip the quotes as in your sample input but those quotes are escaping commas, you could hack together a solution but you would be better off using a proper CSV parsing tool. (eg perl's Text::CSV)

William Pursell
  • 204,365
  • 48
  • 270
  • 300
1

Here's a simple pipeline that will do the trick:

sed '1d' data.txt | cut -d, -f2-3 | tr -d '"' | tr ',' '>'

Here, we're just removing the first line (as desired), selecting fields 2 & 3 (based on a comma field separator), removing the double quotes and mapping the remaining , to >.

costaparas
  • 5,047
  • 11
  • 16
  • 26
  • I was working well until I found that is not working for a type of data containing comma like this one: data,VERSION,"FUNDS.TRANSFER,ASS.VERS.TIERS.BOP",, which returns VERSION>FUNDS.TRANSFER instead of VERSION>FUNDS.TRANSFER,ASS.VERS.TIERS.BOP Can you help me out updating it please ? Thanks – woodyboow Jan 17 '21 at 21:36
  • @woodyboow in such a case, its better to use existing tool to do the parsing, as I describe [in this answer](https://stackoverflow.com/a/65767645/14722562). – costaparas Jan 18 '21 at 01:27
0

Use this Perl one-liner:

perl -F',' -lane 'next if $. == 1; print join ">", map { tr/"//d; $_ } @F[1,2]' in_file

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array @F on whitespace or on the regex specified in -F option.
-F',' : Split into @F on comma, rather than on whitespace.

SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47