Amend a column in df with a mean based on specific strings in another column

Question

Here is an example df:

names  count  x    y
02.3   89    va1  31
02.3   44    va1  22 
01.8   15    va2  12
01.8   17    va2  3
03.3   24    va3  3
03.3   21    va3  9

I want to calculate mean count for two categories of names. 02.3 and 01.8 should be amended as high and 03.3 as low. I also want to keep all other columns in that df. So, my output would be like this:

names  count   x   y
high  41.25  va1  31
high  41.25  va1  22 
high  41.25  va2  12
high  41.25  va2  3
low   22.5   va3  3
low   22.5   va3  9

How can I do this?

score 2 · Answer 1 · answered Mar 11 '21 at 12:45

Updating it in two steps like this seems to work:


df <- within( read.table(text=
"names  count  x    y
02.3   89    va1  31
02.3   44    va1  22
01.8   15    va2  12
01.8   17    va2  3
03.3   24    va3  3
03.3   21    va3  9
", header=TRUE ),
{
    names <- sprintf( "%04.1f", names )
})

df %>%
    mutate(
        names = ifelse( names == "03.3", "low", "high" )
    ) %>% group_by( names ) %>%
    mutate(
        count = mean(count)
    )

score 1 · Answer 2 · answered Mar 11 '21 at 14:04

tidyverse

library(dplyr)
df %>%
  mutate(names = if_else(names == "03.3", "low", "high")) %>%
  group_by(names) %>%
  mutate(count = mean(count))
# # A tibble: 6 x 4
# # Groups:   names [2]
#   names count x         y
#   <chr> <dbl> <chr> <int>
# 1 high   41.2 va1      31
# 2 high   41.2 va1      22
# 3 high   41.2 va2      12
# 4 high   41.2 va2       3
# 5 low    22.5 va3       3
# 6 low    22.5 va3       9

If you're using dplyr, you should almost always use if_else over base::ifelse, as it has some issues.

base R

df <- transform(df, names = ifelse(names == "03.3", "low", "high"))
df$count <- ave(df$count, df$names, FUN = mean)

(Actually, mean is the default function, so you could do ave(df$count, df$names).)

Data

df <- structure(list(names = c("02.3", "02.3", "01.8", "01.8", "03.3", "03.3"), count = c(89L, 44L, 15L, 17L, 24L, 21L), x = c("va1", "va1", "va2", "va2", "va3", "va3"), y = c(31L, 22L, 12L, 3L, 3L, 9L)), row.names = c(NA, -6L), class = "data.frame")

Amend a column in df with a mean based on specific strings in another column

2 Answers2

tidyverse

base R