0

I need some help creating a matrix. I have a large dataset with multiple groups. Each group is sorted into cases and non cases.

For ex.

Group Cases Noncases
GroupA 4 7
GroupB 9 4
GroupC 10 3

I want to create a matrix which will compare one group to the sum of the other groups.

For instance:

Disease Category GroupA NotGroupA
Case 4 19
Noncase 7 7

The goal is to set up a matrix which will allow me to run a chisquare test and/or a Fisher's exact test (depending on sample size).

I have tried the following code to extrapolate values from my dataframe into a matrix:

GroupA <- as.table(matrix(c(df[1,3], df[1,4], (sum(df$group_cases)-df$group_cases[1])), (sum(df$Noncases)-df$Noncases[1])), nrow=2, ncol=2,
                dimnames=list(Group= c("A", "Other"),
                              Case = c(1, 0)))

However, I get the following error:

Warning message:
In matrix(c(df[1, 3], df[1, 4], (sum(df$group_cases) -  :
  data length [3] is not a sub-multiple or multiple of the number of rows [329]

It outputs a 329 row list instead of a 2 by 2 matrix.

Because I have many groups, I want R to calculate the values for me when constructing the matrix. I don't want to calculate the "NotGroup_" column separately, as that makes room for human error.

How would you all recommend constructing this matrix, and is it possible to have R calculate the sums of columns/subtract values while creating a matrix?

Thank you for your help!

r2evans
  • 141,215
  • 6
  • 77
  • 149
gbg
  • 69
  • 5

2 Answers2

2

dplyr

library(dplyr)
library(tidyr) # pivot_*
dat %>%
  mutate(Group = ifelse(Group == "GroupA", "GroupA", "NotGroupA")) %>%
  pivot_longer(-Group, names_to = c("Case")) %>%
  pivot_wider(Case, names_from = Group, values_from = value, values_fn = list(value = sum))
# # A tibble: 2 x 3
#   Case     GroupA NotGroupA
#   <chr>     <int>     <int>
# 1 Cases         4        19
# 2 Noncases      7         7

base R

dat2 <- transform(dat, Group = ifelse(Group == "GroupA", "GroupA", "NotGroupA"))
aggregate(. ~ Group, data = dat2, FUN = sum)
#       Group Cases Noncases
# 1    GroupA     4        7
# 2 NotGroupA    19        7

(though the axes are reversed)


Data

dat <- structure(list(Group = c("GroupA", "GroupB", "GroupC"), Cases = c(4L, 9L, 10L), Noncases = c(7L, 4L, 3L)), class = "data.frame", row.names = c(NA, -3L))

A related link provides many ways to "summarize by group": Calculate the mean by group

r2evans
  • 141,215
  • 6
  • 77
  • 149
1

Set up example:

dd <- data.frame(Group = LETTERS[1:3], Cases = c(4, 9, 10),
                 Noncases = c(7,4,3))

Function:

mktab <- function(focal, data) {
    ## subset rows according to whether $Group == focal or not
    ## subset cols according to "Cases"/"Noncases"
    ## sum() the not-focal elements
    matrix(c(data[data$Group==focal, "Cases"],
             sum(data[data$Group!=focal, "Cases"]),
             data[data$Group==focal, "Noncases"],
             sum(data[data$Group!=focal, "Noncases"])
             ),
           nrow = 2,
           byrow=TRUE,
           dimnames = list(c("Case", "Noncase"),
                           c(focal, paste0("not_", focal)))
           )
}

mktab("A", dd)
             

Results:

        A not_A
Case    4    19
Noncase 7     7
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453