0

I'm a newer user of R and understand how to make my code work but I know there has to be a dplyr or purrr function that does this more efficiently and with a lot less code? If there is I haven't found it yet. My PI wants a summation of our race data but the trick is to have it separated by one race and then if they answered more than one race the sum breakdown of those. I did a subset of the data to get just those columns and then added the columns individually in each row and output that to a new matrix 7x7 to get sums of each.

This is my code. My question is there a much more efficient way of doing this?

-sum races to create totaled matrix of all races

subset <- subset(dataset[,11:17])
test <- matrix(,nrow=7, ncol=7)

colnames(test) <- c("African_American", "Asian", "Hawaiian_Pacific", "Native_Alaskan", "White_Euro", "Hispanic_Latino", "No-Answer")

rownames(test) <- c("African_American", "Asian", "Hawaiian_Pacific", "Native_Alaskan", "White_Euro", "Hispanic_Latino", "No-Answer")

-basic design of "if ==1 then strictly one race. If >1 stick in appropriate category

test[1,1] <- sum(subset$African_American==1, na.rm=TRUE)

test[1,2] <- sum(subset$African_American+subset$Asian>1, na.rm=TRUE)

test[1,3] <- sum(subset$African_American+subset$Hawaiian_Pacific>1, na.rm=TRUE)

test[1,4] <- sum(subset$African_American+subset$Native_Alaskan>1, na.rm=TRUE)

test[1,5] <- sum(subset$African_American+subset$White_Euro>1, na.rm=TRUE)

test[1,6] <- sum(subset$African_American+subset$Hispanic_Latino>1, na.rm=TRUE)

test[1,7] <- sum(subset$African_American+subset$`No-Answer`>1, na.rm=TRUE)

test[2,1] <- sum(subset$Asian+subset$African_American>1, na.rm=TRUE)

test[2,2] <- sum(subset$Asian==1, na.rm=TRUE)...

There are seven columns to add to each other so it moves all the way through the matrix and outputs something similar to this where the diagonal are actual counts of only one race and the others are multiple occurrences: matrix

The Chez
  • 21
  • 3
  • My suggestion would be to provide a reproducible example, the algorithm used to make the calculations, show your attempts and the desired output you're after. See [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to provide data in an easy-to-paste manner. – Roman Luštrik Aug 21 '18 at 06:20

1 Answers1

0

I found a way which is not using plyr but the r-base function apply.

data = data.frame(set1 = round(runif(n = 10,min = 0,max = 1)),
              set2 = round(runif(n = 10,min = 0,max = 1)),
              set3 = round(runif(n = 10,min = 0,max = 1)),
              set4 = round(runif(n = 10,min = 0,max = 1)),
              set5 = round(runif(n = 10,min = 0,max = 1)),
              set6 = round(runif(n = 10,min = 0,max = 1)),
              set7 = round(runif(n = 10,min = 0,max = 1))
)
res = apply(combn(1:ncol(data), 2), 2, function(x) sum(data[, x[1]] & data[, x[2]]))
test <- matrix(0,nrow=7, ncol=7)
test[upper.tri(test)] = res
> test
 [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    5    3    2    2    4    2    2
[2,]    0    5    5    3    4    5    4
[3,]    0    0    6    3    1    0    5
[4,]    0    0    0    8    3    3    1
[5,]    0    0    0    0    2    2    2
[6,]    0    0    0    0    0    6    3
[7,]    0    0    0    0    0    0    6

The first part is producing some test data. combn(1:ncol(data), 2) is telling apply to use a function for each combination of 2 columns. The & function then is returning TRUE for all entries of data[, x[1]] and data[, x[2]] (the 2 selected comlumns) where both values are 1. The summation is counting these. As a return you get the desired values. The following two lines construct a matrix as you wanted. Please note that with addition of

res2 = apply(combn(1:ncol(data), 1), 2, function(x) sum(data[, x[1]]))
test[cbind(1:7,1:7)] <- res2

ou can also set the diagonal to the correct counts. Anyway this is only working for objects having answered 1 in 2 columns. It wont find those who are Asian, Hispanic and American. But you can compute this with a slight change to combination of 3 columns :

apply(combn(1:ncol(data), 3), 2, function(x) sum(data[, x[1]] & data[, x[2]] & data[, x[3]]))

Please also note that my random data may not be representative/unrealistic.

marco
  • 80
  • 2
  • 7
  • This is the output of my long winded code. I'm just looking for a cleaner, more reproducible version. – The Chez Aug 21 '18 at 22:48
  • You said "This is the output of my long winded code.". I do not understand what "This" is referring to. – marco Aug 22 '18 at 07:48
  • I can't add images in yet but the matrix link is to what I was referring. I'm trying to do a count/sum by column and row that transforms into a matrix. i.e. [,1], [,1]+[,2], [,1]+[,3].... – The Chez Aug 22 '18 at 15:40