Format .csv using bash

Question

I have data from a Internet table in a text file. I need to convert this file to .csv standard (comma-separated, etc.) and to clean it. E.g.:

Data    Fechamento  Variação    Variação (%)    Abertura    Máxima  Mínima  Volume
30 Abr 2020     2,00    0,76    61,29%  1,99    2,10    1,80    152.100
29 Abr 2020     1,24    -0,44   -26,19%     1,28    1,71    1,20    125.700

My code:

echo -e "File: \c"
read nome_arq

arq=$(<$nome_arq)

arq=$(echo $arq | sed 's/%//g')
arq=$(echo $arq | sed 's/()//g')
arq=$(echo $arq | sed 's/\.//g')
arq=$(echo $arq | sed 's/\+//g')
arq=$(echo $arq | sed 's/ Abr /_04_/g')
arq=$(echo $arq | sed 's/ Mar /\_03_/g')
arq=$(echo $arq | sed 's/\,/\./g')
arq=$(echo $arq | sed 's/\ /\,/g')

append="_clean"
echo -e $arq >> $nome_arq$append

However, there is no line breaks in output, the output file has just a single line:

Data,Fechamento,Variação,Variação,Abertura,Máxima,Mínima,Volume,30_04_2020,2.00,0.76,61.29,1.99,2.10,1.80,152100,29_04_2020,1.24,-0.44,-26.19,1.28,1.71,1.20,125700,

What can I do to keep the original line breaks in my output?

Edit May, 5:

I get my result with the following code:

append="_clean"
cat $nome_arq|while read z;do echo "$z"|sed "s/\s\+/\"xxxx\"/g; s/^/\"/g; s/$/\"/g";done >> $nome_arq$append

sed 's/%//g' $nome_arq$append > output
rm $nome_arq$append
sed 's/()//g' output > output1
rm output
sed 's/\.//g' output1 > output2
rm output1
sed 's/\+//g' output2 > output3
rm output2
sed 's/\"//g' output3 > output4
rm output3
sed 's/xxxxMaixxxx/_05_/g' output4 > output5
rm output4
sed 's/xxxxAbrxxxx/\_04_/g' output5 > output6
rm output5
sed 's/xxxxMarxxxx/\_03_/g' output6 > output7
rm output6
sed 's/,/\./g' output7 > output8
rm output7
sed 's/xxxx/,/g' output8 > output9
rm output8

Obviously, it's far from optmized. I couldn't use "tr" command, for example. How can I get my script leaner?

Edit May, 13

The final code, with some modification:

echo -e "Arquivo nao-estruturado: \c"
read nome_arq

cp $nome_arq $nome_arq"_clean"
arq=$nome_arq"_clean"

sed -i 's/%//g;s/()//g;s/\.//g;s/\+//g;s/ Mai /_05_/g;s/ Abr /_04_/g;s/ Mar /\_03_/g;s/\,/\./g' $arq
sed -r -i  's/[[:space:]]+/,/g' $arq
sed -i 's/Data,Fechamento,Variação,Variação,Abertura,Máxima,Mínima,Volume/ref.date,price.close,var,var.perc,price.open,price.high,price.low,volume/g' $arq

Since many of your `sed` commands simply delete certain characters, you can do this easier using, .ie., `arq=$(tr -d '%().+' <<<$arq)` if in bash, and `arq=$(echo "$arq"|tr -d '%().+')` in POSIX shell (from your question, it is not clear which one you want to use). — user1934428, May 04 '20 at 08:33

score 1 · Answer 1 · answered May 03 '20 at 17:18

1

try this:

cat your_input_File|while read z;do echo "$z"|sed "s/\s\+/\",\"/g; s/^/\"/g; s/$/\"/g";done

This will return:

"Data","Fechamento","Variação","Variação","(%)","Abertura","Máxima","Mínima","Volume"
"30","Abr","2020","2,00","0,76","61,29%","1,99","2,10","1,80","152.100"
"29","Abr","2020","1,24","-0,44","-26,19%","1,28","1,71","1,20","125.700"

answered May 03 '20 at 17:18

Ron

5,900
2
20
30

1

Please replace the useless cat by input redirection. – user1934428 May 04 '20 at 08:35
`cat` is used for ease of understanding, and is far from `useless`. `input redirect` is just another way of feeding in data, which does not render `cat` useless! – Ron May 04 '20 at 08:53
At least it is an additional process which could be avoided. See [here](https://stackoverflow.com/questions/11710552/useless-use-of-cat) for a discussion of the thema. – user1934428 May 04 '20 at 08:57
1

Yes, it is an additional process, however it is easy to understand in the example above, which is more practical in this case. Or you have a suggestion how this can be rewritten in Assembler so that we are REALLY optimal?? – Ron May 04 '20 at 09:09
Why do we need an assembler solution here? If you just do an input redirection into the `for` loop, the `cat` process is not needed, and the overall solution looks cleaner IMO .... although of course legibility is always something subjective. – user1934428 May 04 '20 at 09:12
I used `cat your_input_File|while read z;do echo "$z"|sed "s/\s\+/\"xxxx\"/g; s/^/\"/g; s/$/\"/g";done`, because "," separates fields in .csv files. – Danusio Gadelha Filho May 05 '20 at 17:13
@Danusio Gadelha Filho Did you solve your issue? If my answer helped solve it, I would appreciate it if you mark it as Accepted. – Ron May 05 '20 at 17:17
@Ron Partially. I just edited my question, but thnx, your idea worked well. – Danusio Gadelha Filho May 05 '20 at 17:29

user1934428 · Answer 2 · 2020-05-06T08:11:04.670

1

(UPDATED)

The newlines are lost when you do the final echo. If you do not need the interpretation of backslashed sequences (which you request by using echo -e(and which IMO doesn't make sense in your case anyway, at least not for your example input)), do a

cat <<<"$arq" >> "$nome_arq$append"

instead.

edited May 06 '20 at 08:11

answered May 04 '20 at 09:07

user1934428

19,864
7
42
87

I got the same result with this syntax :/ – Danusio Gadelha Filho May 05 '20 at 17:30
Can't be **if** there are really newlines in your variable. Did you verify it by doing a `xxd <<<"$arq"`? – user1934428 May 06 '20 at 07:21
@DanusioGadelhaFilho : Could you try my updated solution? – user1934428 May 06 '20 at 08:11

score 0 · Accepted Answer · answered May 13 '20 at 16:05

Final code:

echo -e "Arquivo nao-estruturado: \c"
read nome_arq

cp $nome_arq $nome_arq"_clean"
arq=$nome_arq"_clean"

sed -i 's/%//g;s/()//g;s/\.//g;s/\+//g;s/ Mai /_05_/g;s/ Abr /_04_/g;s/ Mar /\_03_/g;s/\,/\./g' $arq
sed -r -i  's/[[:space:]]+/,/g' $arq
sed -i 's/Data,Fechamento,Variação,Variação,Abertura,Máxima,Mínima,Volume/ref.date,price.close,var,var.perc,price.open,price.high,price.low,volume/g' $arq

Format .csv using bash

3 Answers3