0

In an R data.frame I would to find the missing year by group and add a row for each missing year and repeat the last value.

An example

This is a data.frame

 1. GROUP/YEAR1/YEAR2/YEAR3
 2. A/100/190/na
 3. A/90/na/300
 4. B/200/70/na

I Want

1. GROUP/YEAR1/YEAR2/YEAR3
  2. A/100/190/190
  3. A/90/90/300
  4. B/200/70/70
diego
  • 1
  • 3
  • 1
    Here is [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) – Sotos Nov 10 '17 at 13:51

2 Answers2

1

You can use complete from tidyr to complete the sequence, and then fill to fill the NAs per group, i.e.

library(tidyverse)

df %>% 
 complete(YEAR, GROUP) %>% 
 group_by(GROUP) %>% 
 fill(VALUE)

which gives,

# A tibble: 4 x 3
# Groups:   GROUP [2]
   YEAR  GROUP VALUE
  <int> <fctr> <int>
1  2000      A   190
2  2001      A   200
3  2000      B    70
4  2001      B    70

EDIT

As per your new requirements, it seems as though you only need to fill NAs rowwise. In that case, a simple base R solution could be,

as.data.frame(t(apply(df, 1, function(i) zoo::na.locf(i))))
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • don't put it here. Edit it in your question – Sotos Nov 10 '17 at 13:44
  • Hi Sotos, excuse me but I change my question. I don't know how write to you directly. I'm a beginner. For the moment thank you very veru much. If you can reply to my new post it will very helpfull. – diego Nov 10 '17 at 13:50
  • Updated my answer. Have alook – Sotos Nov 10 '17 at 13:57
  • 1
    Regarding the EDIT you want the `apply` to only work on the numeric columns so I think you want this: `library(zoo); replace(df, -1, apply(df[-1], 1, na.locf0))` . – G. Grothendieck Nov 10 '17 at 20:38
0

Another approach could be to use merge with expand.grid to pad missing rows and na.locf to fill NA.

df <- merge(expand.grid(GROUP=unique(df$GROUP), YEAR=unique(df$YEAR)), df, all=T)
library(zoo)
df$VALUE <- zoo::na.locf(df$VALUE)
df

Output is:

  GROUP YEAR VALUE
1     A 2000   190
2     A 2001   200
3     B 2000    70
4     B 2001    70
Prem
  • 11,775
  • 1
  • 19
  • 33
  • @diego the updated dataset and expected output seems to have a typo - header has 4 columns but data has 5 columns. You may need to update the sample & expected dataset properly. – Prem Nov 10 '17 at 17:42