0

My data looks something like this, what i want to do now is create a new column "Mean" where i want to find the mean of "customer accepted" for all the same values in the "price" column.

  Product | Price | Customer Accepted
       A      17.2         1
       A      16.8         0
       A      17.2         1
       B      21           1
       B      16.8         0
       A      21           0
       C      17.2         0

For example, 17.2 repeats 3 times and the mean of its corresponding customer accepted values is (1+1+0/3 = 0.66); similarly for 16.8 its (0+0/2 = 0), for 21 it is (1+0/2 = 0.50); the new column, "Mean" should have these values when ever the same price repeats.

My expected output

  Product | Price | Customer Accepted | Mean
       A      17.2         1            0.66   
       A      16.8         0             0 
       A      17.2         1            0.66
       B      21           1            0.50
       B      16.8         0             0 
       A      21           0            0.50 
       C      17.2         0            0.66

There are around 950 different levels for the price value, and the number of times each value repeats is not consistent. Can somebody help me with this ? Thanks a lot

David
  • 43
  • 4

4 Answers4

1

We can use data.table

library(data.table)
 setDT(df)[, Mean := mean(Accpeted), Price]
akrun
  • 874,273
  • 37
  • 540
  • 662
0

There is something called grouping in most big data analysis packages, like for example the data.table package. You could look into that, of course. But some vanilla R possibility would be that one here: This however is not optimal for the sake of readability. (Values are recalculated although they could have been cached.)

a = data.frame(
  product = c("A", "A", "A", "B", "B", "A", "C"), 
  price = c(17.2, 16.8, 17.2, 21, 16.8, 21, 17.2), 
  accepted = c(1, 0, 1, 1, 0, 0)
)

invisible(
  lapply(1:nrow(a), function(i) { 
    a[i, "mean"] <<- mean(a[a$price == a[i, "price"], "accepted"])
  })
)

This does literally what you were trying to do: Iterating through each row, assign a new value to the data.frame that is the mean of all accepted-values where the price is the same as in this row.

K. Rohde
  • 9,439
  • 1
  • 31
  • 51
0

I hope I understood you correctly and below is the code which can be used to do the same.

 df <- data.frame(Product = c("A","A","A","B","B","A","C"),Price = c(17.2,16.8,17.2,21,16.8,21,17.2),Accpeted = c(1,0,1,1,0,0,0))

df$mean <- ave(df$Accpeted,df$Price,FUN=mean)

I have used ave function from Base R.

Output:

    Product Price Accpeted      mean
1       A  17.2        1 0.6666667
2       A  16.8        0 0.0000000
3       A  17.2        1 0.6666667
4       B  21.0        1 0.5000000
5       B  16.8        0 0.0000000
6       A  21.0        0 0.5000000
7       C  17.2        0 0.6666667
PKumar
  • 10,971
  • 6
  • 37
  • 52
0

The dplyr approach would look like this.

library(dplyr)
df <- data.frame(Product = c("A","A","A","B","B","A","C"), Price = c(17.2,16.8,17.2,21,16.8,21,17.2), CustomerAccepted=c(1,0,1,1,0,0,0))

df.summ <- 
    df %>% 
    group_by(Price) %>%
    summarise(Mean = mean(CustomerAccepted))
Andrew Chisholm
  • 6,362
  • 2
  • 22
  • 41