6

I have a dataframe that I need to group by a combination of columns entries in order to conditionally mutate several columns using only an if statement (without an else condition).

More specifically, I want to sum up the column values of a certain group if they cross a pre-defined threshold, otherwise the values should remain unchanged.

I have tried doing this using both if_else and case_when but these functions require either a "false" argument (if_else) or by default set values that are not matched to NA (case_when):

iris_mutated <- iris %>%
  dplyr::group_by(Species) %>%
  dplyr::mutate(Sepal.Length=if_else(sum(Sepal.Length)>250, sum(Sepal.Length)),
                Sepal.Width=if_else(sum(Sepal.Width)>170, sum(Sepal.Width)),
                Petal.Length=if_else(sum(Petal.Length)>70, sum(Petal.Length)),
                Petal.Width=if_else(sum(Petal.Width)>15, sum(Petal.Width)))

iris_mutated <- iris %>%
  dplyr::group_by(Species) %>%
  dplyr::mutate(Sepal.Length=case_when(sum(Sepal.Length)>250 ~ sum(Sepal.Length)),
                Sepal.Width=case_when(sum(Sepal.Width)>170 ~ sum(Sepal.Width)),
                Petal.Length=case_when(sum(Petal.Length)>70 ~ sum(Petal.Length)),
                Petal.Width=case_when(sum(Petal.Width)>15 ~ sum(Petal.Width)))

Any ideas how to do this instead?

Edit:

Here is an example for the expected output. The sum of the petal width for all species-wise grouped entries is 12.3 for setosa, 101.3 for virginica and 66.3 for versicolor. If I require that this sum should be at least 15 for the values to be summed up (otherwise the original value should be kept), then I expect the following output (only showing the columns "Petal.Width" and "Species"):

Petal.Width    Species
1           0.2     setosa
2           0.2     setosa
3           0.2     setosa
4           0.2     setosa
5           0.2     setosa
6           0.4     setosa
7           0.3     setosa
8           0.2     setosa
9           0.2     setosa
10          0.1     setosa
#...#
50          0.2     setosa
51          66.3 versicolor
52          66.3 versicolor
53          66.3 versicolor
#...#
100         66.3 versicolor
101         101.3  virginica
102         101.3  virginica
103         101.3  virginica
#...#
150         101.3  virginica
atreju
  • 965
  • 6
  • 15
  • 36
  • 1
    If "otherwise the values should remain unchanged", you can use the original column value as false or default value. – LocoGris Jan 28 '19 at 14:32
  • But how can I reference the "original column value" after the grouping? If I use, e.g., "Sepal.Length=if_else(sum(Sepal.Length)>250, sum(Sepal.Length), Sepal.Length)", then I get an error because "Sepal.Length" is (due to the grouping) a vector. – atreju Jan 28 '19 at 14:34
  • Can you please share an example of what you expect the output to look like? – Tom Haddow Jan 28 '19 at 14:36
  • A sum produces one number, while the original is a vector. So I can't see how you can have both in the final output. Either the sum gets duplicated, or you reduce the original column to one number if the threshold is not crossed. Either way, you need to show what your final output should look like. – acylam Jan 28 '19 at 14:43
  • 1
    @avid_useR The output still keeps the same number of rows as he is using mutate not summarise. – Tom Haddow Jan 28 '19 at 14:49
  • 2
    @TomHaddow I am aware of that. That's why I said "Either the sum gets duplicated". This is intended to show OP that his/her question is not entirely clear. – acylam Jan 28 '19 at 14:56

1 Answers1

4

I think you are after this? Using Johnny's method. You shouldn't hit an error when you use the original value as part of case_when in the case when the sum is not greater than the cutoff...

iris_mutated <- iris %>% 
  group_by(Species) %>% 
  mutate(Sepal.Length = case_when(sum(Sepal.Length) > 250 ~ sum(Sepal.Length),
                                   T ~ Sepal.Length),
         Sepal.Width = case_when(sum(Sepal.Width) > 170 ~ sum(Sepal.Width),
                                   T ~ Sepal.Width),
         Petal.Length = case_when(sum(Petal.Length) > 70 ~ sum(Petal.Length),
                                   T ~ Petal.Length),
         Petal.Width = case_when(sum(Petal.Width) > 15 ~ sum(Petal.Width),
                                   T ~ Petal.Width))
Tom Haddow
  • 230
  • 1
  • 10
  • How does this retrieve the other species except setosa? – NelsonGon Jan 28 '19 at 14:54
  • This solution works, thank you very much. I changed the Petal.Width threshold in the original question from 12 to 15 to make the expected output more clear, though (in case someone is wondering why the output you get from the accepted answer is not the same as the expected output). – atreju Jan 28 '19 at 15:05
  • Just made the change in the answer to prevent any confusion. – Tom Haddow Jan 28 '19 at 15:41