Insert rows in dataframe based on condition - the Tidyverse way

Question

Here is a data frame

# 5 companies observed each day for 10 days
df <- tibble(
  company = rep(LETTERS[1:5], 10),
  value = rep(sample(100, 5), 10),
  date = rep(seq(as.Date("2020-01-01"), as.Date("2020-01-10"), 1), each = 5)
)
df

Now something happens to the data and some of the company E rows are removed.

df_error <- df[-c(5, 10, 15, 20), ]
df_error

What is the simplest Tidyverse way to add back the E rows. Value doesn't matter. The date of the E row is the same as the D row above it.

I started with the following and wasn't sure how to proceed:

# Find all D occurrences
e_idx <- which(df_error$company == "D")
e_idx

# If there is not an E in the next row, get the index. These need E rows below each index value. 
rows_need_e_below <- ifelse(df_error[e_idx + 1, 1] != "E", e_idx, NA)
rows_need_e_below

score 2 · Accepted Answer · answered Sep 23 '20 at 02:29

If you know that your data should have "A" to "E" companies you can use complete :

tidyr::complete(df_error, date, company = LETTERS[1:5])

Or more generally :

unique_company <- c('A', 'B', 'C', 'D', 'E')
tidyr::complete(df_error, date, company = unique_company)

# A tibble: 50 x 3
#   date       company value
#   <date>     <chr>   <int>
# 1 2020-01-01 A          87
# 2 2020-01-01 B           5
# 3 2020-01-01 C          40
# 4 2020-01-01 D          67
# 5 2020-01-01 E          NA
# 6 2020-01-02 A          87
# 7 2020-01-02 B           5
# 8 2020-01-02 C          40
# 9 2020-01-02 D          67
#10 2020-01-02 E          NA
# … with 40 more rows

The value column is by default given NA value. If you want to fill it with specific value you can use fill parameter of complete. For example, to fill with 0's you can do :

tidyr::complete(df_error, date, company = unique_company, fill = list(value = 0))

Insert rows in dataframe based on condition - the Tidyverse way

1 Answers1

Linked

Related