0

I have a large ICD-10 data and I would like to create subgroups and get a sum out of it.

For example, I have 'JAL01, JAL20 and JAL21' and I would need a sum of all the codes starting with 'JAL'. How do I do that?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Elina
  • 23
  • 3
  • 1
    Would this be of any help? https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group – heilala Jul 28 '20 at 08:12
  • Does it always start with 3 letters? – zx8754 Jul 28 '20 at 08:26
  • 1
    @zx8754 Yes, if I do it in the way that I was planning to do it. The more letters I use, the more spesific it is. – Elina Jul 28 '20 at 08:52
  • @heilala This isn't working with me. I guess I would first need to somehow cut these 'JAL01' 's into subgroups... – Elina Jul 28 '20 at 08:56
  • @Elina Could you share a short example of the dataframe you are working with and maybe some code you used trying to solve the problem? It might be easier to recommend solutions. – heilala Jul 28 '20 at 09:10
  • @Elina then the answer below by zx8754 should do it – heilala Jul 29 '20 at 11:54

1 Answers1

0

Substring first 3 letters, then group by and sum:

# example data
df1 <- data.frame(icd = c("JAL01", "JAL20", "JAL21", "foo11", "foo22"),
                  x = 1:5)

# get 1st 3 letters
df1$grp <- substr(df1$icd, 1, 3)

# get sum per group
aggregate(x ~ grp, df1, sum)
#   grp x
# 1 foo 9
# 2 JAL 6
zx8754
  • 52,746
  • 12
  • 114
  • 209