-1

Using this code from my dataset I was able to separate out each specific ICD10Code for each PatientId:

data.code<-data.1 %>% group_by(ICD10Code,PatientId) %>%
  summarise(ReferralSource=first(ReferralSource),
    NextAppt=first(NextAppt), Age=max(Age),
    InsuranceName=toString(unique(InsuranceName)))

enter image description here

ICD10Code PatientId ReferralSource   NextAppt   Age InsuranceName      
 <fct>     <fct>     <fct>            <fct>    <int> <chr>              
 1 ""        397       Piedmont Hospit… N           51 SLIDING FEE SCHEDU…
 2 ""        1770      St Francis       N           42 SLIDING FEE SCHEDU…
 3 ""        9787      St Francis       Y           55 *SELF PAY*, SLIDIN…
 4 ""        18872     Piedmont Hospit… Y           50 SLIDING FEE SCHEDU…
 5 ""        20172     St Francis       Y           55 Medicaid-GA (Medic…
 6 A084      1856      Piedmont Hospit… N           35 *SELF PAY*, SLIDIN…
 7 A609      10937     Piedmont Hospit… Y           31 SLIDING FEE SCHEDU…
 8 A749      18705     St Francis       N           38 SLIDING FEE SCHEDU…
 9 B001      19100     St Francis       N           37 SLIDING FEE SCHEDU…
10 B079      19076     St Francis       N           47 Medicaid-GA (Medic…
11 B182      9690      St Francis       N           49 *SELF PAY*, SLIDIN…
12 B20       18990     St Francis       N           53 Medicaid-GA (Medic…
13 B349      20235     Piedmont Hospit… N           35 SLIDING FEE SCHEDU…
14 B351      4781      St Francis       N           36 BCBS-GA            
15 B351      7466      St Francis       Y           47 SLIDING FEE SCHEDU…
16 B351      18820     Piedmont Hospit… Y           25 BCBS-GA            
17 B353      18990     St Francis       N           53 Medicaid-GA (Medic…
18 B370      397       Piedmont Hospit… N           51 SLIDING FEE SCHEDU…
19 B370      19112     St Francis       Y            0 *SELF PAY*, CareSo…
20 B370      20291     St Francis       Y            0 BCBS-GA (POS), SLI…

What I need to do now, and not sure how, is to loop through the ICD10Code column and calculate the mean age from Age column for each unique ICD10 code while keeping duplicates.

For example from the data above, ICD10Code B351 occurs three times and the corresponding age for each B351 is 36,47,25. I want to calculate the mean from the age for that value. I think that I need a for-loop and will need to create a new data frame consisting of Code and the mean age. How would I go about doing this?

smci
  • 32,567
  • 20
  • 113
  • 146
Brandon
  • 89
  • 2
  • 9
  • The question is a duplicate of every existing question looking for a groupby-summarize/mutate. **Your confusion is simply about the difference between summarize vs mutate**. (And when you said *"I do not need to remove any duplicates"* you meant to say *"must keep duplicates"*) – smci Dec 12 '19 at 20:13
  • Duplicate: there are [401 existing questions on \[r\] summarize mutate is:question](https://stackoverflow.com/search?q=%5Br%5D+summarize+mutate+is%3Aquestion). Some of them explain the difference between mutate vs summarize. – smci Dec 12 '19 at 20:16
  • Appreciate your time providing edits and explanations. Sorry for confusion. – Brandon Dec 12 '19 at 20:20

2 Answers2

0

Try:

library(tidyverse)
dt %>% group_by(ICD10code) %>% summarise(mAge = mean(Age, na.rm = T))

And if you want to attach it to your other code:

data.1 %>% group_by(ICD10Code,PatientId) %>% 
summarise(ReferralSource=first(ReferralSource),NextAppt=first(NextAppt),Age=max(Age),
InsuranceName=toString(unique(InsuranceName))) %>% ungroup() %>% group_by(ICD10code) %>% mutate(mAge = mean(Age, na.rm = T))
akash87
  • 3,876
  • 3
  • 14
  • 30
  • Wouldn't this remove the duplicated ICD10codes? As the question specified not to remove them. – Annet Dec 12 '19 at 19:57
  • Yes, mea culpa. I did update it to be `mutate` by group as opposed to `summarise` by group. – akash87 Dec 12 '19 at 19:58
0

Using dplyr you might want to group on your ICD10Code and add a column that represent the average age:

data.code <- data.code %>% group_by(ICD10Code) %>% mutate(average_age = mean(Age))

This way you will not lose any rows, which I assumed you want due to the " I do not need to remove any duplicates" part and the fact that the columns contain different values for the same ICD10Code. If you choose for summarise (which is another option) you will remove the ICD10Code duplicates.

Annet
  • 846
  • 3
  • 14