0

I collected some data over the summer that I am trying to analyze in a simple way. I have a few columns, but the data itself spans through 5 columns and 100+ rows.

Each row is a single participant with a participant ID. I am trying to drop their min and max measurements in their row, through all 5 columns, thus having only 3 columns left. I also need to remove any individuals who do not have all 5 measurements. Then, with those three columns, I will need to average and create a new 4th column that contains the average.

I'm getting a little overwhelmed with the different ways to do this. It needs to be able to loop through all the rows. Please let me know if you have any suggestions.

This is what the dataframe (eumR) looks like...

> eumR town id eumr1 eumr2 eumr3 eumr4 eumr5 1 WLS SA4001 NA NA NA NA NA 2 WLS SA4002 NA NA NA NA NA 3 WLS SA4003 NA NA NA NA NA 4 WLS SA4004 NA NA NA NA NA 5 WLS SA4005 NA NA NA NA NA .... 80 BVL SA4080 12.6 15.2 13.6 14.9 14.2 81 BVL SA4081 15.2 16.5 16.5 17.3 20.5 82 BVL SA4082 15.1 14.8 14.5 14.6 16.7 83 BVL SA4083 15.6 15.8 15.8 15.5 13.2 84 BVL SA4084 15.7 15.8 17.3 17.9 20.8

slava-kohut
  • 4,203
  • 1
  • 7
  • 24
  • 7
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Try to focus on one specific question at a time. – MrFlick Nov 05 '19 at 16:19

3 Answers3

3

You can try something like this:

eumR <- eumR[complete.cases(eumR),]

eumR$mean <- apply(eumR, 1, function(x) {
  x <- as.numeric(x[3:7])
  mean(x[!(x %in% c(min(x),max(x)))])})

I cannot debug without a reproducible example, please provide one.

slava-kohut
  • 4,203
  • 1
  • 7
  • 24
0

This is also a tidyverse solution. Would convert from wide to long format with pivot_longer which would be preferable for managing this kind of data. drop_na will remove rows of data with missing values.

For each town and id, would filter to ensure you have 5 rows of data for each as required. Then, would rank the values (with ties determined randomly). This will ensure you can calculate the mean based on 3 out of the 5 values in cases of ties for minimum or maximum.

library(tidyverse)

df %>%
  pivot_longer(cols = starts_with("eumr"), names_to = "eumr", values_to = "value") %>%
  drop_na(value) %>%
  group_by(town, id) %>%
  filter(n() == 5) %>%
  mutate(the_rank  = rank(-value, ties.method = "random")) %>%
  filter(the_rank > 1 & the_rank < 5) %>%
  summarise(mean_value = mean(value))
Ben
  • 28,684
  • 5
  • 23
  • 45
0

Okay, with everyone's help I was able to get to this solution:

install.packages("NCmisc") library(NCmisc)

Function to drop min & max of a list of numbers, and return average

drop_avg = function(v){
  new = v[v != max(v) & v!= min(v)] # remove min & max
  avg = mean(new)
  return(avg)
 }

Create new columns for averages

data$avg_E_Inner_R = NA 
data$avg_E_Inner_L = NA
data$avg_M_Inner_R = NA
data$avg_M_Inner_L = NA 

Loop through rows of data frame "data"

Drops min & max values, averages them, puts avg in another column in data frame

for (row in 1:nrow(data)){
  E_Inner_R = data[row, c("E_Inner_R_Arm_1","E_Inner_R_Arm_2","E_Inner_R_Arm_3","E_Inner_R_Arm_4","E_Inner_R_Arm_5")] 
  data[row,]$avg_E_Inner_R = drop_avg(E_Inner_R)

  E_Inner_L = data[row, c("E_Inner_L_Arm_1","E_Inner_L_Arm_2","E_Inner_L_Arm_3","E_Inner_L_Arm_4","E_Inner_L_Arm_5")]
  data[row,]$avg_E_Inner_L = drop_avg(E_Inner_L)
  
  M_Inner_R = data[row, c("M_Inner_R_Arm_1","M_Inner_R_Arm_2","M_Inner_R_Arm_3","M_Inner_R_Arm_4","M_Inner_R_Arm_5")]
  data[row,]$avg_M_Inner_R = drop_avg(M_Inner_R)
  
  M_Inner_L = data[row, c("M_Inner_L_Arm_1","M_Inner_L_Arm_2","M_Inner_L_Arm_3","M_Inner_L_Arm_4","M_Inner_L_Arm_5")]
  data[row,]$avg_M_Inner_L = drop_avg(M_Inner_L)
}