2

I currently have the dataset:

Time Var1 Var2 Cat
1    14   16   1
2    16   98   1
4    52   68   1
7    17   12   1
1    57   58   2
3    45   35   2 
4    33   12   2
7    77   1    2

And wish to be able to create continuous time variable for each of category, like so:

   Time Var1 Var2 Cat
    1    14   16   1
    2    16   98   1
    3    NA   NA   1
    4    52   68   1
    5    NA   NA   1
    6    NA   NA   1
    7    17   12   1
    1    57   58   2
    2    NA   NA   2
    3    45   35   2 
    4    33   12   2
    5    NA   NA   2
    6    NA   NA   2
    7    77   1    2

I am struggling to be able to impute all the data rows for each category.

All help is greatly appreciated.

mt1022
  • 16,834
  • 5
  • 48
  • 71
James Todd
  • 113
  • 1
  • 3
  • 12
  • data.table solution: `library(data.table); d1 <- setDT(df)[, .(Time = seq(min(Time), max(Time))), by = Cat]; df[d1, on = names(d1)]` - Taken from my question/answer [here](https://stackoverflow.com/questions/46712379/efficient-way-to-fill-time-series-per-group) – Sotos Dec 20 '17 at 14:22

2 Answers2

2

Perhaps you can use something like complete or expand from the "tidyverse" along with a full_join:

Try:

library(tidyverse)
mydf %>% complete(Time = full_seq(Time, 1), Cat)

Or:

mydf %>% 
  expand(Time = 1:7, Cat) %>% 
  full_join(mydf) %>% 
  arrange(Cat, Time)

Alternative functions to consider would be expand.grid in base R, and CJ in "data.table", and then merge.

Here's an example with "data.table" (similar to what @Sotos suggested):

setDT(mydf)[with(mydf, CJ(Time = min(Time):max(Time), Cat = unique(Cat))), 
            on = c("Time", "Cat")][order(Cat, Time)]
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • This would be a little easier with `df %>% complete(Time = 1:7, Cat)` but probably a duplicate – talat Dec 20 '17 at 14:18
  • 1
    @docendodiscimus, Had just occurred to me, but expand had come to mind first because I was thinking of the `expand.grid` + `merge` approach. – A5C1D2H2I1M1N2O1R2T1 Dec 20 '17 at 14:20
2

Base R solution:

df <- read.table(text = "Time Var1 Var2 Cat
1    14   16   1
2    16   98   1
4    52   68   1
7    17   12   1
1    57   58   2
3    45   35   2 
4    33   12   2
7    77   1    2", header = T)

df2 <- expand.grid(Time = 1:7, Cat = 1:2)

df3 <- merge(df, df2, by = c("Time", "Cat"), all = T)
df3 <- df3[order(df3$Cat, df3$Time), c(1, 3, 4, 2)]

> df3
   Time Var1 Var2 Cat
1     1   14   16   1
3     2   16   98   1
5     3   NA   NA   1
7     4   52   68   1
9     5   NA   NA   1
11    6   NA   NA   1
13    7   17   12   1
2     1   57   58   2
4     2   NA   NA   2
6     3   45   35   2
8     4   33   12   2
10    5   NA   NA   2
12    6   NA   NA   2
14    7   77    1   2
LAP
  • 6,605
  • 2
  • 15
  • 28