0

Here's my dataset:

enter image description here

Can anyone show me how to get the p-value per customer?

Expected output: p-value for AAA, BBB, CCC, DDD, EEE

Thanks!

mgh
  • 71
  • 6
  • Can you provide reproducible data using something like [dput](https://stackoverflow.com/questions/49994249/example-of-using-dput)? We cannot readily make a dataframe from your question. – LMc Jan 14 '21 at 21:51

1 Answers1

2

Try playing around this:

library(dplyr)
#Code
df %>% 
  group_by(Customer) %>% 
  summarise(pval = chisq.test(Age.Group, case)$p.value)

Output:

# A tibble: 2 x 2
  Customer  pval
  <chr>    <dbl>
1 AAA      0.213
2 BBB      0.213

Some data used:

#Data
df <- structure(list(Age.Group = c("18-34", "35-44", "45-54", "55-64", 
"18-34", "35-44", "45-54", "55-64"), Customer = c("AAA", "AAA", 
"AAA", "AAA", "BBB", "BBB", "BBB", "BBB"), case = c(82L, 100L, 
200L, 12L, 92L, 110L, 210L, 22L), control = c(100L, 45L, 23L, 
9L, 95L, 40L, 18L, 4L)), class = "data.frame", row.names = c(NA, 
-8L))
Duck
  • 39,058
  • 13
  • 42
  • 84
  • That p value does not look right. 0.213 with such huge differences between the groups? And exactly the same in both groups? – thelatemail Jan 14 '21 at 22:46
  • @thelatemail Hi dear, it is only an example with very dummy data, real data is a screenshot! – Duck Jan 14 '21 at 22:48
  • I mean the p.value with your dummy data, it doesn't match with the patterns in `df` as shown - `by(df[c("case","control")], df["Customer"], FUN=function(x) chisq.test(x)$p.value )` gives hugely different results. – thelatemail Jan 14 '21 at 22:48
  • Ah, I think I see my issue - you are doing `chisq.test(x,y)` which I don't think fits with the count data as shown, while I was doing `chisq.test(x)` which calculates the statistic on the matrix of counts. – thelatemail Jan 14 '21 at 22:54
  • 1
    @thelatemail That sounds good, it depends on the OP goal too! – Duck Jan 14 '21 at 22:55
  • @thelatemail: what you gave me was what i needed. i think the only issue i'm running to is that I needed that output to be in a table, rather than values. How can I make the out be in a tabular form where the columns are customer and p-value? Thanks! – mgh Jan 15 '21 at 01:50
  • 1
    @mgh - I think you have to edit Duck's code a bit so that each sub-matrix is considered - something like `df %>% group_by(Customer) %>% summarise(pval = chisq.test(cbind(case,control))$p.value)` – thelatemail Jan 15 '21 at 02:00