dplyr broadcasting single value per group in mutate

Question

I am trying to do something very similar to Scale relative to a value in each group (via dplyr) (however this solution seems to crash R for me). I would like to replicate a single value for each group and add a new column with this value repeated. As an example I have

library(dplyr)

data = expand.grid(
  category = LETTERS[1:2],
  year = 2000:2003)
data$value = runif(nrow(data))

data

  category year     value
1        A 2000 0.6278798
2        B 2000 0.6112281
3        A 2001 0.2170495
4        B 2001 0.6454874
5        A 2002 0.9234604
6        B 2002 0.9311204
7        A 2003 0.5387899
8        B 2003 0.5573527

And I would like a dataframe like

data

  category year     value    value2
1        A 2000 0.6278798 0.6278798
2        B 2000 0.6112281 0.6112281
3        A 2001 0.2170495 0.6278798
4        B 2001 0.6454874 0.6112281
5        A 2002 0.9234604 0.6278798
6        B 2002 0.9311204 0.6112281
7        A 2003 0.5387899 0.6278798
8        B 2003 0.5573527 0.6112281

i.e. the value for each category is the value from year 2000. I was trying to think of a general solution extensible to a given filtering criteria, i.e. something like

data %>% group_by(category) %>% mutate(value = filter(data, year==2002))

however this does not work because of incorrect length in the assignment.

Gregor Thomas · Accepted Answer · 2015-12-03T21:26:01.240

16

Do this:

data %>% group_by(category) %>%
  mutate(value2 = value[year == 2000])

You could also do it this way:

data %>% group_by(category) %>%
  arrange(year) %>%
  mutate(value2 = value[1])

or

data %>% group_by(category) %>%
  arrange(year) %>%
  mutate(value2 = first(value))

or

data %>% group_by(category) %>%
  mutate(value2 = nth(value, n = 1, order_by = "year"))

or probably several other ways.

Your attempt with mutate(value = filter(data, year==2002)) doesn't make sense for a few reasons.

When you explicitly pass in data again, it's not part of the chain that got grouped earlier, so it doesn't know about the grouping.
All dplyr verbs take a data frame as first argument and return a data frame, including filter. When you do value = filter(...) you're trying to assign a full data frame to the single column value.

edited Dec 03 '15 at 21:26

answered Dec 03 '15 at 20:47

Gregor Thomas

136,190
20
167
294

ahh okay, yes I knew there was something fishy about passing in data into filter() again but could not think of a way to do this otherwise. In your first example am I correct in assuming under the hood something of the form data[data$year==2002,] is happening and then since this is within the context of a group it is aware how to broadcast these values? – mgilbert Dec 03 '15 at 21:05
1

When things are grouped, think of it like you have an individual data frame for each group, so it's starting with `sub_df = data[data$category == "A"]`. From there, `dplyr` knows the column names, so `value[year == 2000]` it knows to look inside `sub_df` for `year == 2000`, which will returns a boolean vector, TRUE for the rows when year is 2000. It subsets `value`, which is a corresponding vector of `value` based on the boolean vector we created with `year == 2000`. – Gregor Thomas Dec 03 '15 at 21:13
1

Data table does this more explicitly, referring to the sub-data-frames by `.SD` (stands for **s**ub **d**ata table). – Gregor Thomas Dec 03 '15 at 21:14

dplyr broadcasting single value per group in mutate

1 Answers1

Linked

Related