1

I have made a function which increments the values in certain columns in a certain row. I did this by writing a function that subsets through my dataframe to find the row it needs (by looking at sex, then age, then deprivation, then number of partners) and then adds numbers to whichever column I need it to (depending on these risk factors), it then calculates the risk (my code is for STI testing).

However, this does not change my existing dataframe with the new values, but creates a new variable patientRow which holds these new values. I need help with how I can incorporate this into my existing dataframe. Thanks!

adaptRisk <- function(dataframe, sexNum, ageNum, deprivationNum, 
              partnerNum, testResult){
sexRisk = subset(dataframe, sex == sexNum)
ageRisk = subset(sexRisk, age == ageNum)
depRisk = subset(ageRisk, deprivation == deprivationNum)
patientRow = subset(depRisk, partners == partnerNum)
 if (testResult == "positive") {
   patientRow$tested <- patientRow$tested + 1
   patientRow$infected <- patientRow$infected + 1
}
 else if (testResult == "negative") {
   patientRow$tested <- patientRow$tested + 1
}
patientRow <- transform(patientRow, risk = infected/tested)
return(patientRow)
}

This is the head of my dataframe to give you an idea:

  sex    age    deprivation partners tested infected risk
1 Female 16-19  1-2         0-1      132    1        0.007575758
2 Female 16-19  1-2         2        25     1        0.040000000
3 Female 16-19  1-2         >=3      30     1        0.033333333
4 Female 16-19  3           0-1      80     2        0.025000000
5 Female 16-19  3           2        12     1        0.083333333
6 Female 16-19  3           >=3      18     1        0.055555556

The dput of my data is:

structure(list(sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = 
c("Female", 
"Male"), class = "factor"), age = structure(c(1L, 1L, 1L, 1L, 
1L, 1L), .Label = c("16-19", "20-24", "25-34", "35-44"), class = 
"factor"), 
deprivation = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1-2", 
"3", "4-5"), class = "factor"), partners = structure(c(2L, 
3L, 1L, 2L, 3L, 1L), .Label = c(">=3", "0-1", "2"), class = "factor"), 
tested = c(132L, 25L, 30L, 80L, 12L, 18L), infected = c(1L, 
1L, 1L, 2L, 1L, 1L), uninfected = c(131L, 24L, 29L, 78L, 
11L, 17L), risk = c(0.00757575757575758, 0.04, 0.0333333333333333, 
0.025, 0.0833333333333333, 0.0555555555555556)), .Names = c("sex", 
"age", "deprivation", "partners", "tested", "infected", "uninfected", 
"risk"), row.names = c(NA, 6L), class = "data.frame")

An example call to the function:

adaptRisk(data, "Female", "16-19", 3, 2, "positive")
     sex   age deprivation partners tested infected uninfected      risk
5 Female 16-19           3        2     13        2         11 0.1538462
J. Win.
  • 6,662
  • 7
  • 34
  • 52
picador
  • 67
  • 6
  • 1
    Could you create a minimal working example and add some more code what you are exactly doing? Which statement are you using to run your function. And dput(head(yourdataframe)) would help. you can look here to see what I mean by a [minimal working example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – phiver May 13 '18 at 14:57
  • It looks like the output of your function is a data frame whose columns will not match the columns in the original data frame. Therefore, you will have problems because you are "incorporating" two data frames whose columns don't exactly match. – J. Win. May 13 '18 at 15:17
  • 1
    @phiver Thank you for your reply! I have added the dput, would really appreciate the help! – picador May 13 '18 at 16:38
  • @J.Win. I don't understand what you mean? Because subsetting through the dataframe keeps the columns and I havent added/removed anything. When I print patientRow it is the same columns as my dataframe – picador May 13 '18 at 16:40

2 Answers2

0

I have adjusted your function (see all the way below) using base R syntax. It does the job, but is not the most beautiful code.

Issue: The subsets create a lot of extra (and not needed) data.frames, instead of replacing the internal values when the conditions match. And the return was a different data.frame so the existing data.frame could not handle it correctly.

I adjusted it so that the filters are done on the needed objects that you want to change.

Transform might have unintended side effects and you were recalculating the whole risk column. Now only the affected value is recalculated.

You might want to built in some warnings / stops in case the filters return more than 1 record.

You can now use df <- adaptRisk(df, "Female", "16-19", "3", "2", "positive") to replace the values in the data.frame you supply to the function

examples

# affects row 5
adaptRisk(df, "Female", "16-19", "3", "2", "positive") 
     sex   age deprivation partners tested infected uninfected        risk
1 Female 16-19         1-2      0-1    132        1        131 0.007575758
2 Female 16-19         1-2        2     25        1         24 0.040000000
3 Female 16-19         1-2      >=3     30        1         29 0.033333333
4 Female 16-19           3      0-1     80        2         78 0.025000000
5 Female 16-19           3        2     13        2         11 0.153846154
6 Female 16-19           3      >=3     18        1         17 0.055555556

# affects row 5    
adaptRisk(df, "Female", "16-19", "3", "2", "negative")
     sex   age deprivation partners tested infected uninfected        risk
1 Female 16-19         1-2      0-1    132        1        131 0.007575758
2 Female 16-19         1-2        2     25        1         24 0.040000000
3 Female 16-19         1-2      >=3     30        1         29 0.033333333
4 Female 16-19           3      0-1     80        2         78 0.025000000
5 Female 16-19           3        2     13        1         11 0.076923077
6 Female 16-19           3      >=3     18        1         17 0.055555556

function:

adaptRisk <- function(data, sexNum, ageNum, deprivationNum, 
                      partnerNum, testResult){

    if (testResult == "positive") {
    data$tested[data$sex == sexNum & 
               data$age == ageNum &
               data$deprivation == deprivationNum &
               data$partners == partnerNum] <- data$tested[data$sex == sexNum & 
                                                             data$age == ageNum &
                                                             data$deprivation == deprivationNum &
                                                             data$partners == partnerNum] + 1 
    data$infected[data$sex == sexNum & 
                  data$age == ageNum &
                  data$deprivation == deprivationNum &
                  data$partners == partnerNum] <- data$infected[data$sex == sexNum & 
                                                                  data$age == ageNum &
                                                                  data$deprivation == deprivationNum &
                                                                  data$partners == partnerNum] + 1 
    data$risk[data$sex == sexNum &
              data$age == ageNum &
              data$deprivation == deprivationNum &
              data$partners == partnerNum] <- data$infected[data$sex == sexNum & 
                                                                   data$age == ageNum &
                                                                   data$deprivation == deprivationNum &
                                                                   data$partners == partnerNum]/data$tested[data$sex == sexNum & 
                                                                                                              data$age == ageNum &
                                                                                                              data$deprivation == deprivationNum &
                                                                                                              data$partners == partnerNum]

  }
  else if (testResult == "negative") {
    data$tested[data$sex == sexNum & 
                data$age == ageNum &
                data$deprivation == deprivationNum &
                data$partners == partnerNum] <- data$tested[data$sex == sexNum & 
                                                              data$age == ageNum &
                                                              data$deprivation == deprivationNum &
                                                              data$partners == partnerNum] + 1  

   data$risk[data$sex == sexNum &
             data$age == ageNum &
             data$deprivation == deprivationNum &
             data$partners == partnerNum] <- data$infected[data$sex == sexNum & 
                                                             data$age == ageNum &
                                                             data$deprivation == deprivationNum &
                                                             data$partners == partnerNum]/data$tested[data$sex == sexNum & 
                                                                                                        data$age == ageNum &
                                                                                                        data$deprivation == deprivationNum &
                                                                                                        data$partners == partnerNum]
  }
  return(data)
}
phiver
  • 23,048
  • 14
  • 44
  • 56
  • That was really helpful, thank you for doing that! I understand what you mean by subsetting, this is a much better (albeit slightly harder to read) way! – picador May 13 '18 at 18:58
0

The function outputs a single row that -- apparently -- you intend to replace the original row(s). You could replace the original row by doing something like this:

## original data frame is named patientData
patientRow <- adaptRisk(data, "Female", "16-19", 3, 2, "positive") 
patientData[row.names(patientRow), ] <- patientRow
J. Win.
  • 6,662
  • 7
  • 34
  • 52