R How to mutate a subset of rows

Question

I am having trouble mutating a subset of rows in dplyr. I am using the chaining command: %>% to say:

data <- data %>%
  filter(ColA == "ABC") %>%
  mutate(ColB = "XXXX")

This works fine but the problems is that I want to be able to select the entire original table and see the mutate applied to only the subset of data I had specified. My problem is that when I view data after this I only see the subset of data and its updated ColB information.

I would also like to know how to do this using data.table.

Thanks.

I found https://stackoverflow.com/a/34096575/7662327 gives a comprehensive exploration for modifying selected rows. — Shixiang Wang, Oct 18 '19 at 08:03

score 15 · Answer 1 · answered Apr 23 '15 at 23:28

When you use filter() you are actually removing the rows that do not match the condition you specified, so they will not show up in the final data set.

Does ColB already exist in your data frame? If so,

data %>%
  mutate(ColB = ifelse(ColA == "ABC", "XXXX", ColB))

will change ColB to "XXXX" when ColA == "ABC" and leave it as is otherwise. If ColB does not already exist, then you will have to specify what to do for rows where ColA != "ABC", for example:

data %>%
  mutate(ColB = ifelse(ColA == "ABC", "XXXX", NA))

score 14 · Accepted Answer · answered Apr 23 '15 at 23:34

14

Using data.table, we'd do:

setDT(data)[colA == "ABC", ColB := "XXXX"]

and the values are modified in-place, unlike if-else, which'd copy the entire column to replace just those rows where the condition satisfies.

We call this sub-assign by reference. You can read more about it in the new HTML vignettes.

answered Apr 23 '15 at 23:34

Arun

116,683
26
284
387

Thank you. This worked. And how do I do a mulitple criteria filter? I tried the following but it does not seem to be the correct syntax.`setDT(data)[(colA == "ABC") & (colC =="DEF"), ColB := "XXXX"]` – mo_maat Apr 24 '15 at 02:45
@user1991118 that *is* the correct syntax - perhaps you can show a reproducible example. The only thing I can think of is maybe you meant to use `|` instead of `&`. – eddi Apr 24 '15 at 04:22
That was indeed the correct syntax. I had used the wrong variable. It is working. – mo_maat Apr 24 '15 at 16:09

krlmlr · Answer 3 · 2015-04-23T23:49:30.503

Another option is to perform a subsequent combination of union and anti-join with the same data. This requires a primary key:

data <- data %>%
  filter(ColA == "ABC") %>%
  mutate(ColB = "XXXX") %>%
  rbind_list(., anti_join(data, ., by = ...))

Example:

mtcars_n <- mtcars %>% add_rownames
mtcars_n %>%
  filter(cyl > 6) %>%
  mutate(mpg = 1) %>%
  rbind_list(., anti_join(mtcars_n, ., by = "rowname"))

This is much slower than probably any other approach, but useful to get quick results by extending your existing pipe.

Eduardo Zanette · Answer 4 · 2022-06-02T21:20:05.927

Just updating (by June 02nd 2022) @krlmlr great answer:

add_rownames() is deprecated, use tibble::rownames_to_column() instead. rbind_list is also deprecated, use bind_rows instead

You might also find a different sequence of rows in your resulting joined dataset, which depending on your aim is quite difficult to correct with dplyr::arrange() afterwards.

An alternative, although slower, is:

mtcars_n <- mtcars %>% 
  add_rownames() %>% 
  filter(cyl > 6) %>%
  mutate(new_col = 1)
mtcars_m <- left_join(x=mtcars, y=mtcars_n)

R How to mutate a subset of rows

4 Answers4

Linked

Related