0

I have got a data.frame with 4 columns and ca. 6000 rows.

The columns are:

ID, Day (the number of day since the first day of the time period considered), Year (the year in which the observation has been recorded) and Count (the number of observations pooled, between all the IDs, in that particular day).

df = read.table(text = 'ID     Day         Year   Count
        33012   12448         2001      46
        35004   12448         2001      46
        35008   12448         2001      46
        37006   12448         2001      46
        21009    4835         1980      44
        24005    4835         1980      44
        27001    4835         1980      44
        27002    4835         1980      44
        25005    5569         1982      34
        29001    5569         1982      34
        29002    5569         1982      34
        30003    5569         1982      34', header = TRUE)

I need to create a time window of three days and run a for loop for each Day, counting the number of observations in that time range.

e.g. starting from Day 12448 (or "day 0") I need to check in all the dataframe for Day 12447 (day prior or "day -1") and Day 12449 (day after or "day +1") if exist observations recorded. In other words, taking di = 12448, there exist any "di +1" and/or "di -1" in the dataframe? If yes, I have to delete from the dataframe "di +1" and/or "di -1" in order to avoid overlapping and add both their "Count" values into the "di" 's "Count" observation.

Do you have any hint that can help me making the for loop?


@thepule, thanks a lot. I tried to run your code within my dataset. So I created a vector with all the days in the column "Day"

days <- unique(df$Day)

and adjusted the for loop appropriately, but it doesn't work, in the sense that I obtain very low values in the column Count.

Where is the mistake?


Here an example of my dataframe:

df = read.table(text ='ID    Day       Year     Count
         33012  12448       2001         5
         35004  12448       2001         5
         35008  12448       2001         5
         37006  12448       2001         5
         37008  12448       2001         5
         27900  12800       2002         4
         27987  12800       2002         4
         27123  12800       2002         4
         27341  12800       2002         4
         56124  14020       2003         3
         12874  14020       2003         3
         11447  14020       2003         3
         11231  12447       2001         2
         31879  12447       2001         2
         56784  12449       2001         1
         64148  12799       2002         1
         45613  12801       2001         1
         77632  10324       1991         1
         55313  14002       2003         1
         11667  14019       2003         1', header = TRUE)

My output, after the for loop, should be:

         ID    Day       Year     Count
1     33012  12448       2001         8
2     35004  12448       2001         8
3     35008  12448       2001         8
4     37006  12448       2001         8
5     37008  12448       2001         8
6     27900  12800       2002         6
7     27987  12800       2002         6
8     27123  12800       2002         6
9     27341  12800       2002         6
10    56124  14020       2003         4
11    12874  14020       2003         4
12    11447  14020       2003         4
13    77632  10324       1991         1
14    55313  14002       2003         1

n.b each observations, for each ID, are max 1 per year. n.b.b. the Count column is ordered as decreasing = TRUE

  • what about the first entry?in that case there will be no "di-1", but "di+1" only. The opposite for occurs for the last entry. – Sergio.pv Jul 12 '16 at 15:56

1 Answers1

0

Updated answer:

# Create data frame    
tt <- read.table(text = "
        ID    Day       Year     Count
 33012  12448       2001         5
 35004  12448       2001         5
 35008  12448       2001         5
 37006  12448       2001         5
 37008  12448       2001         5
 27900  12800       2002         4
 27987  12800       2002         4
 27123  12800       2002         4
 27341  12800       2002         4
 56124  14020       2003         3
 12874  14020       2003         3
 11447  14020       2003         3
 11231  12447       2001         2
 31879  12447       2001         2
 56784  12449       2001         1
 64148  12799       2002         1
 45613  12801       2001         1
 77632  10324       1991         1
 55313  14002       2003         1
 11667  14019       2003         1", header= T)


# Vector of day targets you want to repeat the procedure for
targets <- unique(tt$Day)


for (i in targets) {
    temp <- tt$Count[tt$Day == i]
    if(length(temp >0)) {
            condition <- tt$Day == i - 1
            if(any(condition)) {
                    tt$Count[tt$Day == i] <- mean(tt$Count[condition]) +  tt$Count[tt$Day == i]
                    tt <- tt[!condition,]
            }
            condition2 <- tt$Day == i + 1
            if(any(condition2)) {
                    tt$Count[tt$Day == i] <- mean(tt$Count[condition2]) +  tt$Count[tt$Day == i]
                    tt <- tt[!condition2,]
    }

    }
}

Output:

tt
  ID   Day Year Count
1  33012 12448 2001     8
2  35004 12448 2001     8
3  35008 12448 2001     8
4  37006 12448 2001     8
5  37008 12448 2001     8
6  27900 12800 2002     6
7  27987 12800 2002     6
8  27123 12800 2002     6
9  27341 12800 2002     6
10 56124 14020 2003     4
11 12874 14020 2003     4
12 11447 14020 2003     4
18 77632 10324 1991     1
19 55313 14002 2003     1
thepule
  • 1,721
  • 1
  • 12
  • 22
  • hi @user181187, it's a bit difficult to understand where the problem lies without seeing the actual dataframe and the output. Can you create a simplified version of the data set and of the expected output? See [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to create a reproducible example. Also, is my example above working as you expect it to work? – thepule Jul 12 '16 at 20:40
  • yes, your code it fine for my purpose. I will update again my question soon. –  Jul 12 '16 at 20:44
  • Updated my answer. The problem was due to the repeating of the "windows limits" in the for loop. – thepule Jul 12 '16 at 21:46
  • Anyway I posted another similar question as I would like to adjust the code for a slightly different analysis. http://stackoverflow.com/questions/38350903/3-days-time-window-for-observations-in-a-dataframe –  Jul 13 '16 at 12:01