R version of SUMIFS or AVERAGEIFS?

Question

I'm trying to conditionally find the average of column 1/column2 based on T/F values in col3 and col 4. I have 1000 rows of data so I was wondering in R had a way to all of them.

kg  ha  condition 1  condition 2 
2   1   True         True
3   0.5 True         False 
4   1   False        True 
5   0.5 False        False

I'm VERY new to R so I'd appreciate if you could use the data set I provided so I can follow along more easily. Thank you so much!

Can you show the expected output. If it is False, then would that be NA — akrun, Nov 20 '19 at 20:53
Is this a variation of your previous question? https://stackoverflow.com/questions/58961059/how-can-i-calculate-a-conditional-sum-with-true-false-information Did none of those answers help? — MrFlick, Nov 20 '19 at 20:59

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

One option is to replace the values the first two columns based on the logical values in 'condition1, '2', and then get the rowMeans

rowMeans(replace(df1[1:2], df1[3:4] == "False", NA), na.rm = TRUE)
#[1] 1.5 3.0 1.0 NaN

Instead of doing the ==, it can also be converted to logical with as.logical

If it is the sums, replace the rowMeans with rowSums

If the 'condition1', 'condition2' are grouping columns, then

aggregate(.~ condition1 + condition2, df1, FUN = mean, na.rm = TRUE)

In R, boolean values are TRUE/FALSE and not True/False

data

df1 <- structure(list(kg = 2:5, ha = c(1, 0.5, 1, 0.5), condition1 = c("True", 
"True", "False", "False"), condition2 = c("True", "False", "True", 
"False")), class = "data.frame", row.names = c(NA, -4L))

The `tidyverse` and `dplyr` answers are very popular these days, which is the only reason I am holding off on upvoting this answer; I tend to prefer being shown the base-R ways of doing things, although I don't get the sense that's what "the community" is looking for. But this is the "best" answer, including mine. — d8aninja, Nov 20 '19 at 21:20

score 1 · Answer 2 · answered Nov 20 '19 at 21:00

dplyr package is perfect for this task. I'll assume your data is saved as My_DF

library(dplyr)
My_DF %>%
group_by(condition 1, condition 2) %>%
summarize( avg_kg = mean(kg, na.rm = TRUE),
           avg_ha = mean(ha, na.rm = TRUE) )

This code take your dataframe, and generates means for variable kg and ha for each combination of condition 1 and condition 2.

The %>% is called a pipe, it passes the result of the previous line as an input to the next line.

This site contains some helpful one page guides to dplyr and basic R: https://rstudio.com/resources/cheatsheets/

Beat me to it. I'll add mine so they have some context for building `My_DF` — d8aninja, Nov 20 '19 at 21:14

score 1 · Answer 3 · answered Nov 20 '19 at 21:16

You'll want to acquaint yourself with the tidyverse package, or more specifically dplyr, as Andre mentions. I'll provide slightly more context around making his example My_DF:

library(tidyverse)

df <- data.frame(
  col_one = sample(1:10, 25, replace = TRUE)
)
df$col_two <- {df$col_one > 5}

df %>% 
  group_by(col_two) %>% 
  summarise(
    totals = sum(col_one))
  )

which will give you something like

> glimpse(df)
Observations: 25
Variables: 2
$ col_one <int> 3, 7, 10, 1, 7, 7, 10, 6, 3, 8, 7, 3, 4, 8, 1, 2, 2, 6, 5, 6, 3, …
$ col_two <lgl> FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TR…
> df %>% 
+   group_by(col_two) %>% 
+   summarise(
+     totals = sum(col_one))
# A tibble: 2 x 2
  col_two totals
  <lgl>    <int>
1 FALSE       35
2 TRUE        97

score 1 · Answer 4 · answered Nov 20 '19 at 21:39

If you are just trying to sum() or average (mean()) kg and ha when conditions are true, this can be done very simply.

Data: (note I removed the spaces from condition 1 and condition 2.

data <- data.frame(kg = 2:5,
                   ha = c(1, 0.5, 1, 0.5),
                   condition1 = c(T, T, F, F),
                   condition2 = c(T, F, T, F))

Conditional Sum

The sum of kg when both condition1 and condition2 are TRUE is,

sum(data$kg[data$condition1 & data$condition2])

So then this expands to,

sum(data$kg[data$condition1 & data$condition2])
sum(data$ha[data$condition1 & data$condition2])
mean(data$kg[data$condition1 & data$condition2])
mean(data$ha[data$condition1 & data$condition2])

Or you can do this on both at once,

colSums(data[data$condition1 & data$condition2, c("kg", "ha")])
colMeans(data[data$condition1 & data$condition2, c("kg", "ha")])

R version of SUMIFS or AVERAGEIFS?

4 Answers4

data