1

I was reading this extremely useful question: How to use shell variables in an awk script

But I can't get it working as I want to in my script. A Part of the core of the script is the following:

#!/bin/bash

# VARS #
XLSX=$1
SHEET=$2

P_PATH="/tmp/test"
EXTRACTCSV="$P_PATH/extract.csv"
TEMPCSV="$P_PATH/out_column.csv"

COLS=4
T_COLS=$(($COLS +1))
COUNTER=2010

# START #
[ -d $P_PATH ] || mkdir $P_PATH
rm -rf $P_PATH/*

ssconvert --export-type=Gnumeric_stf:stf_assistant -O 'sheet='$SHEET' separator=; format=automatic eol=unix' ${XLSX} ${EXTRACTCSV} >/dev/null 2>&1

until [ ${COUNTER} -eq 2018 ]; do
  echo "year: $COUNTER"
  col1=$((${COLS} - 3))
  col2=$((${COLS} - 2))
  col3=$((${COLS} - 1))
  col4=${COLS}
  echo "col1: $col1 - col2: $col2 - col3: $col3 - col4: $col4"
  echo "tail -n +3 ${EXTRACTCSV} | awk -F';' -v var1="$col1" -v var2="$col2" -v var3="$col3" -v var4="$col4" 'BEGIN {print $var1";"$var2";"$var3";"$var4}' |  sed 's/\./\,/g'" #> ${TEMPCSV}"."${COUNTER} # Skip first 2 rows and working on the current column only

  COLS=$(($COLS + $T_COLS))
  let COUNTER+=1
done

Part of EXTRACTCSV:

"DAY START";"DATA1";"PERCENTAGE";"DATA2";;"DAY START";"DATA1";"PERCENTAGE";"DATA2";;"DAY START";"DATA1";"PERCENTAGE";"DATA2";;"DAY START";"DATA1";"PERCENTAGE";"DATA2";;"DAY START";"DATA1";"PERCENTAGE";"DATA2";;"DAY START";"DATA1";"PERCENTAGE";"DATA2";;"DAY START";"DATA1";"PERCENTAGE";"DATA2";;"DAY START";"DATA1";"PERCENTAGE";"DATA2"
2010/01/01;76.1838;0.9497;73.4787;;2011-01-01;400.6334;0.7429;539.2905;;2012/01/01;478.4495;0.8154;586.7632;;2013-01-01;514.3688;0.7406;116.4914;;2014-01-01;602.3154;0.7031;856.6758;;2015-01-01;709.1598;0.7457;950.9946;;2016-01-01;696.2204;0.7006;993.735;;2017/01/01;666.3247;0.6373;1045.5864
2010/01/02;75.9292;0.9465;73.4787;;2011-01-02;398.9055;0.7397;539.2903;;2012-01-02;478.3616;0.8153;586.7631;;2013-01-02;511.6213;0.7367;116.4996;;2014-01-02;601.0048;0.7016;856.6742;;2015-01-02;704.6024;0.7409;950.9953;;2016-01-02;692.7325;0.6971;993.8134;;2017/01/02;659.8746;0.6311;1045.5821
2010/01/03;75.5417;0.9417;73.4787;;2011-01-03;396.1734;0.7346;539.29;;2012-01-03;476.7519;0.8125;586.7638;;2013-01-03;508.9712;0.7329;116.4981;;2014-01-03;599.4631;0.6998;856.6302;;2015-01-03;700.7211;0.7368;950.9943;;2016-01-03;688.9282;0.6932;993.9072;;2017/01/03;652.2328;0.6238;1045.5821
2010/01/04;74.9889;0.9348;73.4787;;2011-01-04;392.8654;0.7285;539.2906;;2012-01-04;475.0166;0.7396;586.7636;;2013-01-04;506.9215;0.7299;116.4955;;2014-01-04;598.3154;0.6985;856.5889;;2015-01-04;696.8216;0.7327;950.9946;;2016-01-04;683.3585;0.6875;993.9379;;2017/01/04;644.9272;0.6168;1045.6002
2010/01/05;74.3354;0.9267;73.4787;;2011-01-05;389.7434;0.7227;539.2906;;2012-01-05;473.383;0.7368;586.7636;;2013-01-05;505.1373;0.7273;116.4953;;2014-01-05;597.1538;0.6971;856.6002;;2015-01-05;691.2058;0.7268;950.9937;;2016-01-05;677.7324;0.6819;993.9349;;2017/01/05;636.1736;0.6084;1045.6002
2010/01/06;73.5865;0.9173;73.4787;;2011-01-06;388.1988;0.7177;540.9071;;2012-01-06;471.874;0.7342;586.7635;;2013-01-06;503.1365;0.7245;116.4973;;2014-01-06;595.5167;0.6952;856.57;;2015-01-06;685.5321;0.7209;950.9938;;2016-01-06;672.7798;0.6769;993.9423;;2017/01/06;627.0485;0.5997;1045.6002
2010/01/07;72.7931;0.9075;73.4787;;2011-01-07;386.287;0.7141;540.9069;;2012-01-07;470.588;0.732;586.7626;;2013-01-07;500.267;0.7202;116.5772;;2014-01-07;593.5777;0.6929;856.6254;;2015-01-07;679.5659;0.7146;950.9971;;2016-01-07;667.4732;0.6716;993.9342;;2017/01/07;619.7181;0.5927;1045.6002
2010/01/08;71.9871;0.8974;73.4787;;2011-01-08;385.0015;0.7118;540.9071;;2012-01-08;469.3661;0.7999;586.7626;;2013-01-08;497.3401;0.716;116.5887;;2014-01-08;591.4693;0.6904;856.6439;;2015-01-08;674.2712;0.709;950.9948;;2016-01-08;662.5751;0.6666;993.9427;;2017/01/08;612.5235;0.5858;1045.6002

With this command:

tail -n +3 ${EXTRACTCSV} | awk -F';' -v var1="$col1" -v var2="$col2" -v var3="$col3" -v var4="$col4" 'BEGIN {print $var1";"$var2";"$var3";"$var4}' |  sed 's/\./\,/g'" #> ${TEMPCSV}"."${COUNTER}

I wish to extract the first 4 columns from EXTRACTCSV file, for 8 times. First time for the year 2010, the second for the 2011 and so on until today.

Expected output:

year 2010 cicle #1
col1       col2    col3   col4
2010/01/01;76.1838;0.9497;73.4787
2010/01/02;75.9292;0.9465;73.4787
2010/01/03;75.5417;0.9417;73.4787
2010/01/04;74.9889;0.9348;73.4787
2010/01/05;74.3354;0.9267;73.4787
2010/01/06;73.5865;0.9173;73.4787
2010/01/07;72.7931;0.9075;73.4787
2010/01/08;71.9871;0.8974;73.4787
year 2011 cicle #2
col6       col7     col8   col9
2011-01-01;400.6334;0.7429;539.2905
2011-01-02;398.9055;0.7397;539.2903
2011-01-03;396.1734;0.7346;539.29
2011-01-04;392.8654;0.7285;539.2906
2011-01-05;389.7434;0.7227;539.2906
2011-01-06;388.1988;0.7177;540.9071
2011-01-07;386.287;0.7141;540.9069
2011-01-08;385.0015;0.7118;540.9071

...And so on for the other years. Can someone tell me what is wrong with how I am using awk?

EDIT: Update for Inian

user@local:~/excel$ ./test2.sh in.xlsx YR_2010_2017
    year: 2010
    col1: 1 - col2: 2 - col3: 3 - col4: 4
    tail -n +3 /tmp/test/extract.csv | awk -v var1=1 -v var2=2 -v var3=3 -v var4=4 'BEGIN{FS=OFS=
    ./test2.sh: line 28: }{print  ,,,}' | sed 's/\./\,/g': No such file or directory
    year: 2011
    col1: 6 - col2: 7 - col3: 8 - col4: 9
    tail -n +3 /tmp/test/extract.csv | awk -v var1=6 -v var2=7 -v var3=8 -v var4=9 'BEGIN{FS=OFS=
    ./test2.sh: line 28: }{print  ,,,}' | sed 's/\./\,/g': No such file or directory
    year: 2012

The other rows are the same..

Community
  • 1
  • 1
aPugLife
  • 989
  • 2
  • 14
  • 25

3 Answers3

1

Why would you run the core of the execution in the BEGIN clause, do it as below

BEGIN{FS=OFS=";"}{print  $var1,$var2,$var3,$var4}

just set the output-field-separator alone in the BEGIN clause. Also remember statements within BEGIN gets executed before acutal processing happens. So your core logic should not be within BEGIN unless you are working only on input variables and not on any input stream.

The full command can be just

tail -n +3 "${EXTRACTCSV}" | awk -v var1="$col1" -v var2="$col2" -v var3="$col3" -v var4="$col4" 'BEGIN{FS=OFS=";"}{print  $var1,$var2,$var3,$var4}'
Inian
  • 80,270
  • 14
  • 142
  • 161
  • I only and always used `awk -F`, I missed how to properly use `BEGIN` so thanks for explaining it!! I'm trying your code right now! – aPugLife Apr 25 '17 at 12:32
  • I am updating the main post soon with the output so you can see formatted, thank you!. The `print` of awk does not print the values – aPugLife Apr 25 '17 at 12:34
  • @Nihvel: You don't even have the complete command, the `awk` command is hanging in the update you did – Inian Apr 25 '17 at 12:38
  • in the main post, the values from col1 to col4 is the year 2010. from col6 to col9 the 2011 and continues. What do you mean with complete command? This is what I am using. There would be another while later on but it is not necessary for this part of the script – aPugLife Apr 25 '17 at 12:42
  • @Nihvel: I meant the awk variables you used `var1` to `var4` – Inian Apr 25 '17 at 12:45
  • Ok it worked now, using the ssh without colors I missed a `"` that broke the code. I'm trying it on the main script! – aPugLife Apr 25 '17 at 12:50
1

Since you want skip the 2 first lines, print the first fields and perform some changes on them, you can use awk all in one shot:

awk 'BEGIN{FS=OFS=";"} NR<=3{next} {NF=4; gsub(/\./, ",")}1' file

Some explanations:

  • BEGIN{FS=OFS=";"} set the input and output field separator to ;.
  • NR<=3{next} skip lines before the 3rd one.
  • {NF=4; gsub(/\./, ",")}1 squeeze the record to the first 4 fields. Then, replace dots with commas using gsub(). Finally, trigger the print by providing a True condition 1.

With your given input it returns:

2010/01/03;75,5417;0,9417;73,4787
2010/01/04;74,9889;0,9348;73,4787
2010/01/05;74,3354;0,9267;73,4787
2010/01/06;73,5865;0,9173;73,4787
2010/01/07;72,7931;0,9075;73,4787
2010/01/08;71,9871;0,8974;73,4787

The code above was just focusing in the first loop. If you want to print all the data in batches of 4 columns each time (2010 -> columns 1 to 4, 2011 -> columns 6 to 9, etc.) then some looping will make:

BEGIN{FS=OFS=";"}
NR<first_line{next}
{for(i=1;i<=NF;i++)
    data[NR][i]=$i
}
END {
    for(col=1;col<=40;col+=5) {
        year=2010+col/5;
        printf "\nYear %d cicle \#%d\n", year, col/5 + 1;
        for (line=first_line;line<=NR;line++) {
            print data[line][col], data[line][col+1], data[line][col+2], data[line][col+3]
        }
    }
}

Save it in a file like script.awk and run it with your data:

$ awk -v first_line=3 -f script.awk file

Year 2010 cicle #1
2010/01/02;75.9292;0.9465;73.4787
2010/01/03;75.5417;0.9417;73.4787
2010/01/04;74.9889;0.9348;73.4787
2010/01/05;74.3354;0.9267;73.4787
2010/01/06;73.5865;0.9173;73.4787
2010/01/07;72.7931;0.9075;73.4787
2010/01/08;71.9871;0.8974;73.4787

(...)

Year 2017 cicle #8
2017/01/02;659.8746;0.6311;1045.5821
2017/01/03;652.2328;0.6238;1045.5821
2017/01/04;644.9272;0.6168;1045.6002
2017/01/05;636.1736;0.6084;1045.6002
2017/01/06;627.0485;0.5997;1045.6002
2017/01/07;619.7181;0.5927;1045.6002
2017/01/08;612.5235;0.5858;1045.6002
fedorqui
  • 275,237
  • 103
  • 548
  • 598
  • I can not see where the awk will print the 6,7,8 and 9 column after the first year, arriving to the column 39 of the CSV file when it is 2017. COLS=4 are the 4 columns to get for each cicle and the T_COLS is COLS+1 because 1 column of space between the years. **impressive** code, really! I am going to test it – aPugLife Apr 25 '17 at 12:40
  • I found the solution already but I upvoted this answer because it is not bad at all! – aPugLife Apr 25 '17 at 13:03
  • @Nihvel oh, I have to admit I did not go through your post throughly and just focused on my free interpretation of your desired output together with the `tail ... | awk ... | sed` code. So yes, this is incomplete and the `{NF=4}` trick does not work then. We can do some [awk-golfing](http://stackoverflow.com/a/18819899/1983854) to remove columns or go ahead and use `print $var1, ...`. – fedorqui Apr 25 '17 at 13:07
  • 1
    No problem ;) it is up to you if you want to update the answer based on the OP requests (me). Though I found the solution that works perfect for my needs, others probably wish to be more professional in their script. I do not know awk so well to be able to play a lor around it. Whatever your choice is, I thank you for the time you put on it (: – aPugLife Apr 25 '17 at 13:13
  • @Nihvel find the general approach to get rid of your script and just use this. – fedorqui Apr 25 '17 at 14:27
  • Thanks for the update! there would be a leap column between each set of 4 columns. that's why the 2011 does not start from col5 rather from col6. But this is the least. I guess it is easy to match the correct column. Well, good job (: thanks! too bad i can not upvote once again – aPugLife Apr 25 '17 at 14:47
  • @Nihvel ok I understand now. Just for completeness, I have updated the solution to fix that error and take that leap column. – fedorqui Apr 26 '17 at 06:46
0

Or if you are just interested in the result and not the method, you can try something like:

#!/bin/bash
CSVFILE="csv.csv"

for year in `seq 0 7`; do
    i=$[year*4+1];
    echo
    echo "Year: 201$year"
    AWKCMD="{print \$$i,\$$[i+1],\$$[i+2],\$$[i+3]}"

    tail -n +3 $CSVFILE | sed 's/;;/;/g' | awk -F ';' -e "$AWKCMD"
done
Sokre
  • 114
  • 6
  • This could work though it is ugly! +1 for trying but I solved already with less changes in the code (the ideal for my logic) – aPugLife Apr 25 '17 at 14:52