0

So I was revising what this guy asked: How do I "fill down"/expand observations with respect to a time variable?

I need the same thing for my dataset:

So they send him to check this:Complete column with group_by and complete (i tried to replicate the answers codes, but they didn't worked)

So my dataset looks like this (I present a simplification, in the real dataset there are more variables, and the real dimensions are 631230 obs. of 21 variables)

df

Year   ID          Name  Brunch Sales  Wages   Labor productivity
2014   1750941579   JEN    A     3       2           1.5
2015   1750941579   JEN    A     4       2           2
2016   1750941579   JEN    A     6       4           1.5
2017   1750941579   JEN    A     8       4           2
2018   1750941579   JEN    A     8       4           2
2014   1303477204   MIC    B     6       2           3
2015   1303477204   MIC    B     8       4           2

so i used this code DF<-complete(df, ID, Year=full_seq(Year, period=1),fill=list(Labor productivity=0))

and got something like this

Year   ID           Name       Brunch     Sales  Wages   Labor productivity
2014   1750941579   JEN           A        3       2           1.5
2015   1750941579   JEN           A        4       2           2
2016   1750941579   JEN           A        6       4           1.5
2017   1750941579   JEN           A        8       4           2
2018   1750941579   JEN           A        8       4           2
2014   1303477204   MIC           B        6       2           3
2015   1303477204   MIC           B        8       4           2
2016   1303477204   #¿NOMBRE?     B        0       0           NaN
2017   1303477204    NA           NA       NA      NA          NA 
2018   1303477204    NA           NA       NA      NA          NA 

It completed the panel, as I wanted, but is there a way to keep the Name, Brunch, (and other columns not listed here)?

It's fine if the quantitative variables (sales, wages) are NA or 0 i don't mind. But I need to keep the qualitative variables(Name and Brunch, that are associated with the ID).

I tried with this code from the second link (adaptation to my dataset)

DF<-df %>% 
  group_by(Year, ID) %>% 
  summarise(`Labor Productivity`=n()) %>% 
  ungroup() %>% 
  complete(Year, ID, fill = list(`Labor Productivity`=1))

but i only get summarise() regrouping output by 'Year' (override with .groups argument)

and the output dataset looks like this:

Year   ID          Name  Labor productivity
2014   1750941579   JEN        1
2014   1303477204   MIC        1
2015   1750941579   JEN        1
2015   1303477204   MIC        1
2016   1750941579   JEN        1
2016   1303477204   MIC        1

And so on... (dimensions: 631230 obs. of 3 variables)

So, second question: What's wrong with this code?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Jorge Paredes
  • 996
  • 7
  • 13

1 Answers1

2

You could fill the variables that you want.

library(dplyr)
library(tidyr)

df %>%
  complete(ID, Year=full_seq(Year, period=1),fill=list(Labor_productivity=0)) %>%
  group_by(ID) %>%
  fill(Name, Brunch)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Can you share some more details? What do you mean by `'didn't work'` ? What did you expect and what does my answer return? – Ronak Shah Feb 14 '21 at 10:22
  • Your answer gives me the same result as the second data frame I exposed `DF`. – Jorge Paredes Feb 14 '21 at 18:29
  • I know why it didn't worked! I had to delete this part `fill=list(labor_productivity=0)` and before the last `fill` i put this one: `df$Name[df$Name=="#¿NOMBRE?"]<-NA` then, the last fill, but i got the last `fill` like this: `fill(Name, Brunch,.direction="downup")` Thank you!! – Jorge Paredes Feb 14 '21 at 19:11