-1

I have a large data set of hospital discharge records. There are procedure codes for each discharge, with numerous columns containing the codes (principle code, other 1, other 2...other 24). I would like to get a frequency list for 20 specific codes, so I need to get the frequency across multiple columns. Any help would be appreciated!

Example:

#Sample Data
ID <- c(112,113,114,115)
Sex <- c(1,0,1,0)
Princ_Code <- c(1,2,5,3)
Oth_Code_1 <- c(5,7,8,1)
Oth_Code_2 <- c(2,10,12,9)
discharges <- data.frame(ID,Sex,Princ_Code,Oth_Code_1, Oth_Code_2)

I'd like to get a frequency count of specific codes across the columns. Something like:

x   freq
1   2
2   2
3   1
12  1
user2554330
  • 37,248
  • 4
  • 43
  • 90
W. Bensken
  • 11
  • 2
  • 2
    Please provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with data and expected output. – neilfws Mar 07 '18 at 22:51
  • I've updated with a sample of some data with what I'd like the intended output to be. I've tried doing a count() for each column and adding the subsetting it based on the specific codes I'm looking for. Honestly, beyond this I was stumped. I am new to R. – W. Bensken Mar 07 '18 at 23:19

1 Answers1

1

One way to think about this problem is to transform the data from a wide format (multiple columns with identically-typed data) to a tall format (where each column is a fairly different type from the others). I'll demonstrate using tidyr, though there are base and data.table methods as well.

out <- tidyr::gather(discharges, codetype, code, -ID, -Sex)
out
#     ID Sex   codetype code
# 1  112   1 Princ_Code    1
# 2  113   0 Princ_Code    2
# 3  114   1 Princ_Code    5
# 4  115   0 Princ_Code    3
# 5  112   1 Oth_Code_1    5
# 6  113   0 Oth_Code_1    7
# 7  114   1 Oth_Code_1    8
# 8  115   0 Oth_Code_1    1
# 9  112   1 Oth_Code_2    2
# 10 113   0 Oth_Code_2   10
# 11 114   1 Oth_Code_2   12
# 12 115   0 Oth_Code_2    9

Do you see how transforming from "wide" to "tall" makes the problem seem a lot simpler? From here, you could use table or xtabs

table(out$code)
#  1  2  3  5  7  8  9 10 12 
#  2  2  1  2  1  1  1  1  1 
xtabs(~code, data=out)
# code
#  1  2  3  5  7  8  9 10 12 
#  2  2  1  2  1  1  1  1  1 

or you can continue with dplyr pipes and tidyr:

library(dplyr)
library(tidyr)

discharges %>%
  gather(codetype, code, -ID, -Sex) %>%
  group_by(code) %>%
  tally()
# # A tibble: 9 × 2
#    code     n
#   <dbl> <int>
# 1     1     2
# 2     2     2
# 3     3     1
# 4     5     2
# 5     7     1
# 6     8     1
# 7     9     1
# 8    10     1
# 9    12     1
r2evans
  • 141,215
  • 6
  • 77
  • 149