0

I have a time series daily dataset for 10 years (1995-2004) with some missing values as 9999.00. I would like to compute annual average for each year without considering the missing value.

I could able to make it by considering 365 days calendar with following command

awk '!/\9999.00/{sum += $1; count++} NR%365==0{print count ? (sum) :9999.00;sum=count=0}'ifile

But I can't able to modify with leap year calendar. I also need to add another column with years. My desire output is as

1995 annual_average
1996 annual_average
1997 annual_average
....

For example: I have following data from 1995-2000. I need to compute average of every 3 lines instead of 365 and 4 lines instead of 366 if it is a leap year:

3
3
4
9999.00
4
9999.00
13
3
9999.00
9999.00
9999.00
9999.00
9999.00
3
4
2
2
2.6
5.1
4.5

Trial command:

awk '!/\9999.00/{sum += $1; count++} NR%3==0{print count ? (sum) :9999.00;sum=count=0}'ifile

Desire output:

1995  3.33
1996  8.5   it is a leap year, so average of 4 lines without considering missing values (4+13)/2
1997  3
1998  9999.00
1999  3
2000  3.55   leap year
Kay
  • 1,957
  • 2
  • 24
  • 46
  • What do you mean by 366 days calendar? You mean leap years? You should also provide sample data input. – xvan May 27 '16 at 01:51
  • 1
    You can determine if a year is leap year by [this rule](http://stackoverflow.com/a/22352190/1477064) and use that information to change NR%365 to NR%366. My awk foo isn't enough to attempt that. – xvan May 27 '16 at 02:31
  • Thank you for additional information @xvan But I can't able to modify my script. Major problem is how NR%365 will change to NR%366 in one time series? – Kay May 27 '16 at 03:00
  • Change `NR%365==0 ` for `count ==DY-1`, where DY is a variable storing the amount of days in the current year. How o set it's value is what you have to solve. – xvan May 27 '16 at 03:13
  • This is basically the same question as http://stackoverflow.com/questions/37474078/compute-average-if-satisfies-the-given-condition-in-shell-script – Michael Vehrs May 27 '16 at 09:23

1 Answers1

1

This code works for your sample data. Of course, you will need to adjust the target values:

BEGIN {
    year = 0;
    target = 3;
}
$1 < 9990.00 {
    sum += $1;
    count++;
}
NR == target {
    if (count == 0) {
        print "9999";
    } else {
        print sum / count;
    }
    sum = 0;
    count = 0;
    year++;
    if (year % 4 == 1) {
        target += 4;
    } else {
        target += 3;
    }
}

Oh, and keep in mind that the very simplistic leap year calculation will fail for some years, although not the years you mentioned.

Michael Vehrs
  • 3,293
  • 11
  • 10