Is there a way to calculate an asymmetrical mean (e. g. from percentile 0.05 to 0.5) by group using the aggregate command? R-STUDIO

Question

I am calculating the Tukey outlier detection algorythm on a data set of prices.

The thing is that I need it to be calculated by group (another variable included in the same data set), which works perfectly fine with the aggregate command up until I need to calculate a mean using only the data between percentile 5 to the median and one using only the data from the median to percentile 95.

As far as I know, the command goes this way: aggregate(doc$x, by=list(doc$group), FUN=mean, trim = 0.05), if the mean was trimmed symmetrically taking the upper and lower 5% (total 10%) from the data before printing the result. I don't know how to go through the next steps where I need to calculate the upper and lower mean taking the median as a division point, still keeping the upper and lower 5% off.

medlow <- aggregate(doc1$`rp`, by=list(doc1$`Código Artículo`), FUN=mean,trim =c(0.05,0.5))
medup <- aggregate(doc1$`rp`, by=list(doc1$`Código Artículo`), FUN=mean,trim =c(0.5,0.95))

medtrunc <- aggregate(doc1$`rp`, by=list(doc1$`Código Artículo`), FUN=mean,trim = 0.05)

I expect the output to be the number I need for each group, but it goes

Error in mean.default(X[[i]], ...) : 'trim' must be numeric of length one.

score 0 · Accepted Answer · answered Aug 24 '19 at 11:59

First, I think you are using aggregate and trim the wrong way. 'trim' must be numeric of length one means that you can only exclude a particular fraction of data from both upper and lower tails of the distribution:

df = data.frame(
  gender = c(
    "male","male","male","male","female","female","female", "female"
    ),
  score = rnorm(8, 10, 2)
  )
aggregate(score ~ gender, data = df, mean, trim = 0.1)

  gender     score
1 female 11.385263
2   male  9.954465

For the splitting based on the median and calculating trimmed mean for the split data, you can easily split your data frame by making a new variable MedianSplit by a simple for loop:

df$MedianSplit <- 0
for (i in 1:nrow(df)) {
  if (df$score[i] <= median(df$score)) {
    df$MedianSplit[i] = "lower" 
  } else {
    df$MedianSplit[i] = "upper"
  }
}

df



gender     score MedianSplit
1   male  7.062605       lower
2   male  9.373052       upper
3   male  6.592681       lower
4   male  7.298971       lower
5 female  7.795813       lower
6 female  7.800914       upper
7 female 12.431028       upper
8 female 10.661753       upper

Then, use aggregate to compute the trimmed means:

For data below than median (i.e., [0, 0.5])

aggregate(
  score ~ gender, 
  data = df[ which(df$MedianSplit == "lower"), ], 
  mean, trim = 0.05
)

  gender    score
1 female 7.795813
2   male 6.984752

and for those above the median (i.e., [0.5, 1]):

aggregate( score ~ gender, data = df[ which(df$MedianSplit == "upper"), ], mean, trim = 0.05 )

  gender     score
1 female 10.297898
2   male  9.373052

To make it fit, imagine that you have males and females from different places, 270 in total. So you want to get all those numbers for each place and that's why I was trying to use 'aggregate' to help me simplify that coding. Plus at the end, the trim wouldn't match because I need the mean from 0.05 to the median (0.5), not 0.45 and likewise for the upper side. — MelaniaCB, Aug 25 '19 at 14:24
@MelaniaCB please edit your question and provide a minimal example using `dput`. If you have two categorical variables or more, using `dplyr` and `group_by` followed by `mutate_at` will help you more than you can imagine. I think a `tidyverse` approach suits your condition — maaniB, Aug 25 '19 at 14:36
Could make it by creating the function infmean <- function( x ){ sort(x) inf <- mean(quantile(x, 0.05):median(x)) return(inf) } And using aggregate medinf <- aggregate(doc$`x`, by=list(doc$`Group`), FUN=infmean) — MelaniaCB, Aug 28 '19 at 04:44

Is there a way to calculate an asymmetrical mean (e. g. from percentile 0.05 to 0.5) by group using the aggregate command? R-STUDIO

1 Answers1