1

Others asked questions how to use a complete function to fill gaps, but I want to group by several others columns and then fill the missing days.

I have a dataframe with many columns, for simplicity, I only show 4 columns like this one

data <- data.frame(groups = c('A', 'A','A','A','A','A','A','A','A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'),
                   date = c(6535, 6535, 6535, 6535, 6535, 6535, 6535, 6535, 6535, 6535, 6535, 6535, 6535, 6535, 6535, 6535, 6535, 6535, 6535),
                   days = c(1, 2, 4, 7, 11, 12, 13, 14, 15, 1, 3, 4, 5, 6, 9, 10, 12, 13, 14),
                   price = c(64, 82, 31, 77, 21, 39, 79, 67, 45, 66, 24, 60, 64, 56, 41, 29, 60, 57, 68))

I want to create new rows for missing days so that I get a table like this one

   groups date days price
1       A 6535    1    64
2       A 6535    2    82
3       A 6535    3    NA
4       A 6535    4    31
5       A 6535    5    NA
6       A 6535    6    NA
7       A 6535    7    77
8       A 6535    8    NA
9       A 6535    9    NA
10      A 6535   10    NA
11      A 6535   11    21
12      A 6535   12    39
13      A 6535   13    79
14      A 6535   14    67
15      A 6535   15    45
16      B 6535    1    66
17      B 6535    2    NA
18      B 6535    3    24
19      B 6535    4    60
20      B 6535    5    64
21      B 6535    6    56
22      B 6535    7    NA
23      B 6535    8    NA
24      B 6535    9    41
25      B 6535   10    29
26      B 6535   11    NA
27      B 6535   12    60
28      B 6535   13    57
29      B 6535   14    68
30      B 6535   15    NA

So I want to group by groups and date, then scan through the days column (with a range of 1 to 15) and add a row for missing days that copies rows of groups and date based on the group by but the price column will have a value of NA.

Zmnako Awrahman
  • 538
  • 7
  • 19
  • please share the code you've written so far and the error you're getting. also, please keep the scope of your problem limited to either pandas or r to avoid confusion. – Ashish Acharya May 08 '18 at 14:49
  • I have not written any codes and I do not know how to do it. I updated the questions to include only R solutions – Zmnako Awrahman May 08 '18 at 14:50

1 Answers1

2

Check out tidyr function called complete

library(dplyr)
library(tidyr)
data %>% group_by(date,groups) %>% complete(days = 1:15)


# # A tibble: 30 x 4
# # Groups: date, groups [2]
#     date groups  days price
#    <dbl> <fct>  <dbl> <dbl>
#  1  6535 A       1.00  64.0
#  2  6535 A       2.00  82.0
#  3  6535 A       3.00  NA  
#  4  6535 A       4.00  31.0
#  5  6535 A       5.00  NA  
#  6  6535 A       6.00  NA  
#  7  6535 A       7.00  77.0
#  8  6535 A       8.00  NA  
#  9  6535 A       9.00  NA  
# 10  6535 A      10.0   NA  
# # ... with 20 more rows
jasbner
  • 2,253
  • 12
  • 24