4

I have a folder with multiple csv files inside, named progressively (starting from 00000.csv up a generic #####.csv). Each csv file has 4 columns and a variable number of rows, N.

What I'd like to do is to write a script of some kind to place inside the folder and that, when executed, reads each csv file progressively and at each step - for the i-th csv file - sum all the N values in the third column in order to obtain the value t and later all the N values in the fourth column in order to obtain the value q and then calculates the final value sqrt(t^2 + q^2) and prints it in the i-th row of a (txt, for example) file, to be generated inside the same folder all the csv files are.

I'd like to have something automated, a run-and-forget-it kind of approach, and not just a command to change every time.


Following @Ed Morton advice, I put here the code I managed to write so far:

#!/bin/bash
shopt -s nullglob
for f in *.csv
do
        cat "$f" | awk -F "," '{sum3 += $3} {sum4 += $4} {final = sqrt(sum3^2 + sum4^2)} END {print final}' > result.txt
done

Looks like it succeds somehow in doing what I need but the problem is that it only displays the correct value for the last csv file as it continously overwrites the previous one.


Suppose I have the following #####=3 csv files:

00000.csv

1.817675, 0.859327, 0.959465, 0.281827
4.264659, 3.040230, -0.787732, -0.616018
3.645565, 2.943500, -0.424509, -0.905424
0.603874, 3.858309, -0.302506, -0.953147
0.056403, 0.410131, 0.941520, 0.336956

00001.csv

1.762620, 0.775846, -0.550544, -0.834806
4.364223, 3.049563, 0.995636, 0.093324
3.675804, 2.848182, 0.302385, -0.953186
0.696330, 3.820203, 0.924550, -0.381060
0.154763, 0.428169, 0.983598, 0.180376

00002.csv

1.781079, 0.677564, 0.184586, -0.982816
4.264546, 3.057596, -0.996768, 0.080330
3.718724, 2.757861, 0.429205, -0.903207
0.733074, 3.913208, 0.367446, 0.930045
0.088634, 0.353155, -0.661285, -0.750135

What I'd like to get in the end would be the following result.txt file:

result.txt

1.895572658137904
3.262622157794096
1.761036700624096

Where, for example,

1.895572658137904 = sqrt[ (0.959465-0.787732-0.424509-0.302506+0.941520)^2 + (0.281827-0.616018-0.905424-0.953147+0.336956)^2 ]

and so on for the other values.

  • Possible duplicate of [Awk: Sum up column values across multiple files with identical column layout](https://stackoverflow.com/questions/44597484/awk-sum-up-column-values-across-multiple-files-with-identical-column-layout) – Krishna Sep 30 '19 at 17:56
  • I'm a bit of a n00b so if it is indeed a duplicate and I could write a script from that I'd be very grateful to anyone able to guide me, as I haven't got a clue – lucia de finetti Sep 30 '19 at 18:04
  • @Krishna how should I modify it to adapt it to my case? – lucia de finetti Sep 30 '19 at 18:11
  • 3
    `reads each csv file progressively` - https://www.cyberciti.biz/faq/bash-loop-over-file/ `sum all the N values in the third column ... ` - [tutorialspoint](https://www.tutorialspoint.com/awk/index.htm) looks a good introduction, but just search `awk tutorial` in google. If you want others to do the job for you, try freelancing sites. This forum is for "specific programming questions", as such I believe your question is "too broad" - you ask how to do many things without providing your own code. Get to know awk, it's one of the easiest languages to learn ever. – KamilCuk Sep 30 '19 at 18:16
  • 1
    @KamilCuk Is the first time I hear of awk, even if in a slightly passive-aggressive fashion :) Anyway, thank you very much! – lucia de finetti Sep 30 '19 at 18:20
  • 3
    A good rule of programming is something called [dynamic programming](https://en.wikipedia.org/wiki/Dynamic_programming). - split the problem into a sum of very small problems. Then solve them one at a time. How do you loop over files in a directory? How do you sum values in a column? IHow do you save that sum in a variable? How do you calculate a power of two of a number? How do you calculate a square? How do you print the i-th row of a file? Technologies you should be interested in - a simple for i in bash loop and a small awk script looks enough to do the job. – KamilCuk Sep 30 '19 at 18:20
  • See https://stackoverflow.com/q/45420535/1745001 for how to parse CSVs with awk and include concise, testable sample input plus expected output plus what you've tried so far if you still have a question after that. – Ed Morton Sep 30 '19 at 19:53
  • 1
    @EdMorton Thanks for your guidance so far! Any suggestion on what I'm doing wrong with my code and how to fix it? – lucia de finetti Sep 30 '19 at 22:51
  • @luciadefinetti I can see some specific issues with your code but what the right solution is depends on what EXACTLY you're trying to do and we'll know for sure what that is once you add sample input/output. That's good you've added your attempt, now add some concise, testable sample input (e.g., say 3 CSV files) and expected output and then you';ll have posted a complete question that we can help you answer. – Ed Morton Sep 30 '19 at 22:51
  • 1
    @EdMorton Thanks again for taking the time! I added testable sample, I'm sorry I didn't do it before :) – lucia de finetti Sep 30 '19 at 23:06

2 Answers2

2

Using GNU awk for ENDFILE and tested with your provided sample input/output:

awk -F ',' '
    { sum3 += $3; sum4 += $4 }
    ENDFILE { printf "%.15f\n", sqrt(sum3^2 + sum4^2); sum3=sum4=0 }
' *.csv
1.895572658137904
3.262622157794095
1.761036700624095

and with any awk:

awk -F ',' '
    { sum3[FILENAME] += $3; sum4[FILENAME] += $4 }
    END {
        for (i=1; i < ARGC; i++) {
            fname = ARGV[i]
            printf "%.15f\n", sqrt(sum3[fname]^2 + sum4[fname]^2)
        }
    }
' *.csv
1.895572658137904
3.262622157794095
1.761036700624095
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Thanks very very much! How should I modify this to have the output in a txt file? fprintf? – lucia de finetti Sep 30 '19 at 23:13
  • No, just add `> textfile` to the end of the last line of the command just like you would any other shell command and just like you did in the code you wrote. – Ed Morton Sep 30 '19 at 23:14
0

OK I am not the best at this but I think this might work (I also did not understand the last part so I just threw the final number in the txt file):

for file in [here you put the path of the directory with csv files]/*.csv; do
t=0
q=0 
    for row in $file;do 
        tt=`echo "$row"|awk -F, '{print $3}'`
        tq=`echo "$row"|awk -F, '{print $4}'`
        t=`echo $((t + tt))`
        q=`echo $((q + tq))`
    done
t=`echo $((t ** 2))`
q=`echo $((q ** 2))`
##finalv is the variable for the final value  
finalv=`echo $((t + q))`
echo "$finalv" >> [here you put the path of the directory with csv files]/file.txt
done
  • You should not use old and deprecated back-tics, use parentheses like this: `t=$(echo $((t ** 2)))` – Jotne Sep 30 '19 at 19:12
  • Could I run to any problems while using back-tics? Is this a best practice thing that I am not aware? – Vitor Lima Sep 30 '19 at 19:28
  • It works, still, but since its deprecated it may go away. Read this: https://stackoverflow.com/questions/4708549/what-is-the-difference-between-command-and-command-in-shell-programming – Jotne Sep 30 '19 at 19:46
  • See http://mywiki.wooledge.org/BashFAQ/082 for the backticks issue and also see https://unix.stackexchange.com/q/169716/133219 for why not to do this at all. – Ed Morton Sep 30 '19 at 19:52
  • 1
    Thank you very much! I did not know that – Vitor Lima Sep 30 '19 at 19:52
  • This doesn't work for me but obviously thanks very much for trying to help me! – lucia de finetti Sep 30 '19 at 22:49