-2

I have a small dataframe containing World Record (WR) times for women's 10000m times. Some years a new record was not set but of course, the previous WR persisted to the following year until a new one was set. I want the dataframe to finish in the year 2015.

I need to add rows where the "Year" variable has gaps and fill the gaps with data of the previous WR.

#Current section of dataframe (gap from '86-'93 then '93-2015):

  Result Year  Event Gender
1  31.35 1982 10000m  women
2  31.35 1983 10000m  women
3  31.28 1983 10000m  women
4  31.14 1984 10000m  women
5  30.59 1985 10000m  women
6  30.14 1986 10000m  women
7  29.32 1993 10000m  women

#Required result:

  Result Year  Event Gender
1  31.35 1982 10000m  women
2  31.35 1983 10000m  women
3  31.28 1983 10000m  women
4  31.14 1984 10000m  women
5  30.59 1985 10000m  women
6  30.14 1986 10000m  women
7  30.14 1987 10000m  women
8  30.14 1988 10000m  women
9  30.14 1989 10000m  women
10 30.14 1990 10000m  women
11 30.14 1991 10000m  women
12 30.14 1992 10000m  women
13 29.32 1993 10000m  women
14 29.32 1994 10000m  women
...etc

(continue last result until 2015)
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Could you please add the code you want us to help you with? – jay.sf Feb 17 '19 at 16:18
  • 1
    A possible duplicate of [Expand rows by date range using start and end date](https://stackoverflow.com/questions/24803361/expand-rows-by-date-range-using-start-and-end-date/54728153#54728153) – tmfmnk Feb 17 '19 at 16:20
  • 2
    `wr %>% complete(Year = 1982:2015) %>% fill(Result, Event, Gender, .direction = "down")` should do the trick (you need `library(tidyr)` for this – kath Feb 17 '19 at 16:22

2 Answers2

1

Assuming the input in the Note at the end merge the input data frame with a data frame of all years and use na.locf from zoo to fill it in.

library(zoo)
Year <- data.frame(Year = min(DF$Year):2015)
m <- na.locf(merge(DF, Year, all.y = TRUE), na.rm = FALSE)

giving (continued after output):

> m
   Year Result  Event Gender
1  1982  31.35 10000m  women
2  1983  31.35 10000m  women
3  1983  31.28 10000m  women
4  1984  31.14 10000m  women
5  1985  30.59 10000m  women
6  1986  30.14 10000m  women
7  1987  30.14 10000m  women
8  1988  30.14 10000m  women
9  1989  30.14 10000m  women
10 1990  30.14 10000m  women
11 1991  30.14 10000m  women
12 1992  30.14 10000m  women
13 1993  29.32 10000m  women
14 1994  29.32 10000m  women
15 1995  29.32 10000m  women
16 1996  29.32 10000m  women
17 1997  29.32 10000m  women
18 1998  29.32 10000m  women
19 1999  29.32 10000m  women
20 2000  29.32 10000m  women
21 2001  29.32 10000m  women
22 2002  29.32 10000m  women
23 2003  29.32 10000m  women
24 2004  29.32 10000m  women
25 2005  29.32 10000m  women
26 2006  29.32 10000m  women
27 2007  29.32 10000m  women
28 2008  29.32 10000m  women
29 2009  29.32 10000m  women
30 2010  29.32 10000m  women
31 2011  29.32 10000m  women
32 2012  29.32 10000m  women
33 2013  29.32 10000m  women
34 2014  29.32 10000m  women
35 2015  29.32 10000m  women

or if the complete file has multiple events and genders then split it by event and gender and apply the same processing to each component of the split rbinding it back together at the end. We can't tell from the question so we have assumed that every event/gender should start at the minimum Year across all events and genders and end in 2015 but this assumption could easily be changed.

f <- function(x) na.locf(merge(x, Year, all.y = TRUE), na.rm = FALSE)
out <- do.call("rbind", by(DF, DF[3:4], f))
rownames(out) <- NULL

Note

Lines <- "
  Result Year  Event Gender
1  31.35 1982 10000m  women
2  31.35 1983 10000m  women
3  31.28 1983 10000m  women
4  31.14 1984 10000m  women
5  30.59 1985 10000m  women
6  30.14 1986 10000m  women
7  29.32 1993 10000m  women"
DF <- read.table(text = Lines)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

You can first complete the dataset by specifying the years you want to fill in and then will the previous values for each missing year with fill. Group by the event and gender so that you can fill in the values for each combination correctly.

library(tidyr)
library(dplyr)

wr %>%
  group_by(Event, Gender) %>% 
  complete(Year = min(Year):2015) %>% 
  fill(Result, .direction = "down")

# A tibble: 35 x 4
# Groups:   Event, Gender [1]
#    Event  Gender  Year Result
#    <fct>  <fct>  <int>  <dbl>
#  1 10000m women   1982   31.4
#  2 10000m women   1983   31.4
#  3 10000m women   1983   31.3
#  4 10000m women   1984   31.1
#  5 10000m women   1985   30.6
#  6 10000m women   1986   30.1
#  7 10000m women   1987   30.1
#  8 10000m women   1988   30.1
#  9 10000m women   1989   30.1
# 10 10000m women   1990   30.1
# ... with 25 more rows
kath
  • 7,624
  • 17
  • 32