0

I have data from a Internet table in a text file. I need to convert this file to .csv standard (comma-separated, etc.) and to clean it. E.g.:

Data    Fechamento  Variação    Variação (%)    Abertura    Máxima  Mínima  Volume
30 Abr 2020     2,00    0,76    61,29%  1,99    2,10    1,80    152.100
29 Abr 2020     1,24    -0,44   -26,19%     1,28    1,71    1,20    125.700

My code:

echo -e "File: \c"
read nome_arq

arq=$(<$nome_arq)

arq=$(echo $arq | sed 's/%//g')
arq=$(echo $arq | sed 's/()//g')
arq=$(echo $arq | sed 's/\.//g')
arq=$(echo $arq | sed 's/\+//g')
arq=$(echo $arq | sed 's/ Abr /_04_/g')
arq=$(echo $arq | sed 's/ Mar /\_03_/g')
arq=$(echo $arq | sed 's/\,/\./g')
arq=$(echo $arq | sed 's/\ /\,/g')

append="_clean"
echo -e $arq >> $nome_arq$append 

However, there is no line breaks in output, the output file has just a single line:

Data,Fechamento,Variação,Variação,Abertura,Máxima,Mínima,Volume,30_04_2020,2.00,0.76,61.29,1.99,2.10,1.80,152100,29_04_2020,1.24,-0.44,-26.19,1.28,1.71,1.20,125700,

What can I do to keep the original line breaks in my output?

Edit May, 5:

I get my result with the following code:

append="_clean"
cat $nome_arq|while read z;do echo "$z"|sed "s/\s\+/\"xxxx\"/g; s/^/\"/g; s/$/\"/g";done >> $nome_arq$append

sed 's/%//g' $nome_arq$append > output
rm $nome_arq$append
sed 's/()//g' output > output1
rm output
sed 's/\.//g' output1 > output2
rm output1
sed 's/\+//g' output2 > output3
rm output2
sed 's/\"//g' output3 > output4
rm output3
sed 's/xxxxMaixxxx/_05_/g' output4 > output5
rm output4
sed 's/xxxxAbrxxxx/\_04_/g' output5 > output6
rm output5
sed 's/xxxxMarxxxx/\_03_/g' output6 > output7
rm output6
sed 's/,/\./g' output7 > output8
rm output7
sed 's/xxxx/,/g' output8 > output9
rm output8

Obviously, it's far from optmized. I couldn't use "tr" command, for example. How can I get my script leaner?

Edit May, 13

The final code, with some modification:

echo -e "Arquivo nao-estruturado: \c"
read nome_arq

cp $nome_arq $nome_arq"_clean"
arq=$nome_arq"_clean"

sed -i 's/%//g;s/()//g;s/\.//g;s/\+//g;s/ Mai /_05_/g;s/ Abr /_04_/g;s/ Mar /\_03_/g;s/\,/\./g' $arq
sed -r -i  's/[[:space:]]+/,/g' $arq
sed -i 's/Data,Fechamento,Variação,Variação,Abertura,Máxima,Mínima,Volume/ref.date,price.close,var,var.perc,price.open,price.high,price.low,volume/g' $arq
  • Since many of your `sed` commands simply delete certain characters, you can do this easier using, .ie., `arq=$(tr -d '%().+' <<<$arq)` if in bash, and `arq=$(echo "$arq"|tr -d '%().+')` in POSIX shell (from your question, it is not clear which one you want to use). – user1934428 May 04 '20 at 08:33
  • I'm using bash, thanks! – Danusio Gadelha Filho May 05 '20 at 15:23

3 Answers3

1

try this:

cat your_input_File|while read z;do echo "$z"|sed "s/\s\+/\",\"/g; s/^/\"/g; s/$/\"/g";done

This will return:

"Data","Fechamento","Variação","Variação","(%)","Abertura","Máxima","Mínima","Volume"
"30","Abr","2020","2,00","0,76","61,29%","1,99","2,10","1,80","152.100"
"29","Abr","2020","1,24","-0,44","-26,19%","1,28","1,71","1,20","125.700"
Ron
  • 5,900
  • 2
  • 20
  • 30
  • 1
    Please replace the useless cat by input redirection. – user1934428 May 04 '20 at 08:35
  • `cat` is used for ease of understanding, and is far from `useless`. `input redirect` is just another way of feeding in data, which does not render `cat` useless! – Ron May 04 '20 at 08:53
  • At least it is an additional process which could be avoided. See [here](https://stackoverflow.com/questions/11710552/useless-use-of-cat) for a discussion of the thema. – user1934428 May 04 '20 at 08:57
  • 1
    Yes, it is an additional process, however it is easy to understand in the example above, which is more practical in this case. Or you have a suggestion how this can be rewritten in Assembler so that we are REALLY optimal?? – Ron May 04 '20 at 09:09
  • Why do we need an assembler solution here? If you just do an input redirection into the `for` loop, the `cat` process is not needed, and the overall solution looks cleaner IMO .... although of course legibility is always something subjective. – user1934428 May 04 '20 at 09:12
  • I used `cat your_input_File|while read z;do echo "$z"|sed "s/\s\+/\"xxxx\"/g; s/^/\"/g; s/$/\"/g";done`, because "," separates fields in .csv files. – Danusio Gadelha Filho May 05 '20 at 17:13
  • @Danusio Gadelha Filho Did you solve your issue? If my answer helped solve it, I would appreciate it if you mark it as Accepted. – Ron May 05 '20 at 17:17
  • @Ron Partially. I just edited my question, but thnx, your idea worked well. – Danusio Gadelha Filho May 05 '20 at 17:29
1

(UPDATED)

The newlines are lost when you do the final echo. If you do not need the interpretation of backslashed sequences (which you request by using echo -e(and which IMO doesn't make sense in your case anyway, at least not for your example input)), do a

cat <<<"$arq" >> "$nome_arq$append" 

instead.

user1934428
  • 19,864
  • 7
  • 42
  • 87
0

Final code:

echo -e "Arquivo nao-estruturado: \c"
read nome_arq

cp $nome_arq $nome_arq"_clean"
arq=$nome_arq"_clean"

sed -i 's/%//g;s/()//g;s/\.//g;s/\+//g;s/ Mai /_05_/g;s/ Abr /_04_/g;s/ Mar /\_03_/g;s/\,/\./g' $arq
sed -r -i  's/[[:space:]]+/,/g' $arq
sed -i 's/Data,Fechamento,Variação,Variação,Abertura,Máxima,Mínima,Volume/ref.date,price.close,var,var.perc,price.open,price.high,price.low,volume/g' $arq