-1

I've a data frame with the variables question_ID and estimate with 210 questions asked to 32 people (so 6720 obs.). I want to calculate the log10 for each estimate and subtract the median of the logs for each question.

E.g. for question 1: Sum(log(Estimates1)-median1)/32, for question 2: Sum(log(Estimates2)-median2)/32 and so on till 210. So that at the end I hopefully have 210 values for each question.

So far I calculated the median for each question:

m <- data %>% group_by(question_ID) %>% summarize(m=median(log10(estimate)))

I'm looking for an elegant solution where I don't need to come up with 210 subsets. Any ideas?

Thanks in advance!

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Julia
  • 111
  • 3
  • Please edit as shown [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – NelsonGon Feb 27 '20 at 09:17
  • 1
    Hi Julia. Welcome to StackOverflow! Please read the info about [how to ask a good question](https://stackoverflow.com/help/how-to-ask) and how to give a [minimale reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). That way you can help others to help you! – dario Feb 27 '20 at 09:21

2 Answers2

0

You can do this using base R functions. ave applies a function to a vector by subsets and returns a result the same length as the original vector.

# Calculate the medians within the dataframe using the ave function
data$logmedians <- ave( log(data$estimate,10) , data$question_ID, FUN=median)

# Now generate the difference between the log medians and the individual answers
data$diflogs <- log(data$estimate, 10) - data$logmedians

I think this is the simplest way to understand. You can neaten things up using within and doing the entire calculation in the ave function:

data <- within(data,{
   diflogs <- ave(estimate, question_ID, FUN=function(x) log(x,10) - median(log(x,10))
   })

Note the median of logs isn't exactly the same as the log of the medians if there is an even number of responses. Be careful about exactly which you want.

George Savva
  • 4,152
  • 1
  • 7
  • 21
0

You can first calculate log of the estimates and for each question subtract it from median value, sum them and divide by 32.

library(dplyr)

data %>% 
 mutate(log_m = log10(estimate)) %>% 
 group_by(question_ID) %>% 
 summarize(m = sum(log_m - median(log_m))/32)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213