Maintaining a consistent set of rows after a subset

Question

I have a scenario where I need to present a consistent table of data, refreshed frequently.

My source data may look like:

Item     | Quantity
-------------------
Dog      | 2
Cat      | 1
Apple    | 6
Banana   | 2
Kiwi     | 4

I only require a few items from my source, which may or may not appear. Currently I'm using subset to select the items of interest:

groceries <- subset(data, item == "Apple"  | 
                           item == "Orange" | 
                           item == "Banana" |
                           item == "Kiwi"
)

Which results in:

Item     | Quantity
-------------------
Apple    | 6
Banana   | 2
Kiwi     | 4

However, I need include blank rows for those items that arnt included in the source, so that my table is consistent between refreshes:

Item     | Quantity
-------------------
Apple    | 6
Orange   |
Banana   | 2
Kiwi     | 4

Is anyone able to guide me how best to approach the above, please?

Instead of subseting create a data frame with the items of interest and merge with your original. Something like `merge(df, data.frame(Item = c(....)), by = 'Item')` — Sotos, Jul 16 '20 at 08:40

score 3 · Accepted Answer · answered Jul 16 '20 at 08:44

We can use merge with all.x = TRUE:

# data
df1 <- read.table(text = "Item Quantity
Dog 2
Cat 1
Apple 6
Banana 2
Kiwi 4", header = TRUE)

# lookup table
x <- data.frame(Item = c("Apple", "Orange", "Banana", "Kiwi"))

# merge
merge(x, df1, by = "Item", all.x = TRUE)
#     Item Quantity
# 1  Apple        6
# 2 Banana        2
# 3   Kiwi        4
# 4 Orange       NA

Darren Tsai · Answer 2 · 2020-07-16T09:17:47.107

Another choice without merging two data frames.

library(dplyr)
item <- c("Apple", "Orange", "Banana", "Kiwi")

df1 %>%
  filter(Item %in% item) %>%
  tidyr::complete(Item = item)

# # A tibble: 4 x 2
#   Item   Quantity
#   <chr>     <int>
# 1 Apple         6
# 2 Banana        2
# 3 Kiwi          4
# 4 Orange       NA

A base solution with match().

data.frame(Item = item, Quantity = df1$Quantity[match(item, df1$Item)])

#     Item Quantity
# 1  Apple        6
# 2 Orange       NA
# 3 Banana        2
# 4   Kiwi        4

Parag Gupta · Answer 3 · 2020-07-16T09:13:20.447

Same thing can be achieved using a left_join() from dplyr Package

#raw data
raw_data <- data.frame(raw_quantity = c(1,2,3), item = c("Apple", "Dog", "Lime"))

#parent file
parent_file <- data.frame(parent_quantity = c(0,2,3), item = c("Apple", "Banana", "Kiwi"))

library(dplyr)
x <- left_join(parent_file, raw_data, by = "item")

Note that left_join preserves all the rows that are at the left side of the join, which in this case is parent_file and will create an extra column raw_quantity for documentation, which can be easily removed by

x <- x[, c("item", "parent_quantity")]

Maintaining a consistent set of rows after a subset

3 Answers3