1

I´m obviously a novice in writing R-code. I have tried multiple solutions to my problem from stackoverflow but I'm still stuck.

My dataset is carcinoid, patients with a small bowel cancer, with multiple variables.

i would like to know how different variables are distributed

carcinoid$met_any - with metastatic disease 1=yes, 2=no(computed variable)
carcinoid$liver_mets_y_n  - liver metastases 1=yes, 2=no
carcinoid$regional_lymph_nodes_y_n  - regional lymph nodes 1=yes, 2=no
peritoneal_carcinosis_y_n  - peritoneal carcinosis 1=yes, 2=no

i have tried this solution which is close to my wanted result

ddply(carcinoid, .(carcinoid$met_any), summarize,
      livermetastases=sum(carcinoid$liver_mets_y_n=="1"),
      regionalmets=sum(carcinoid$regional_lymph_nodes_y_n=="1"),
      pc=sum(carcinoid$peritoneal_carcinosis_y_n=="1"))

with the result being:

  carcinoid$met_any livermetastases regionalmets pc
1                 1              21           46  7
2                 2              21           46  7

Now, i expected the row with 2(=no metastases), to be empty. i would also like the rows in the column carcinoid$met_any to give the number of patients.

If someone could help me it would be very much appreciated! John

Edit

My dataset, although the column numbers are: 1, 43,28,31,33
1=yes2=no

case_nr          met_any     liver_mets_y_n   regional_lymph_nodes_y_n pc
1                   1               1                  1                2
2                   1               2                  1                2               
3                   2               2                  2                2
4                   1               2                  1                1               
5                   1               2                  1                1

desired output - I want to count the numbers of 1:s and 2:s, if it works, all 1:s should end up in the met_any=1 row

           nr   liver_mets        regional_lymph_nodes        pc
met_any=1  4         1                    4                    2
met_any=2  1         4                    1                    3

EDIT

Although i probably was very unclear in my question, with your help i could make the table i needed!

setDT(carcinoid)[,lapply(.SD,table),.SDcols=c(43,28,31,33,17)]

gives

     met_any lymph_nod liver_met     paraortal            extrahep
1:      50      46       21              6               15
2:     111     115      140             151              146

i am very grateful! @mtoto provided the solution John

mtoto
  • 23,919
  • 4
  • 58
  • 71
  • 1
    can you `dput` an example of your original data please – mtoto Jan 26 '16 at 11:07
  • 1
    Several things. You're using `plyr', which is getting a bit outdated/obsolete so you might want to spend your time learning dplyr or data.table instead. If you want to stay with plyr: you split your dataset by met_any, but then accessed the entire dataset. Try what happens if you use `sum(liver_mets_y_n=="1"`. – Heroka Jan 26 '16 at 11:12
  • i'm guessing this is not what you mean..dput(carcinoid$met_any) dput(carcinoid$met_any) c("1", "2", "1", "2", "2", "2", "2", "2", "1", "2", "1", "2", "2", "1", "2", "1", "2", "2", "2", "2", "1", "2", "2", "2", "2", "1", "2", "2", "2", "2", "1", "2", "2", "1", "2", "1", "2", "2", "2", "2", "1", "2", "2", "2", "2", "2", "2", "1", "2", "1", "2", "2", "1", "2", "2", "2", "1", "2", "2", "1", "2", "1", "1", "1", – John Eriksson Jan 26 '16 at 11:14
  • I'm happy to use anything that works, if data.table is better I'll try that – John Eriksson Jan 26 '16 at 11:15
  • 1
    Use `sum` instead of `mean` in the linked dupe and you good to go. – David Arenburg Jan 26 '16 at 11:17
  • do dput(carcinoid) and then copypaste that output into your original answer (do this everytime you want help). I agree w/ Heroka's comment. Learn dplyr, no one uses plyr anymore. A good way to do so is to download the package "swirl", and do the dplyr lesson there... you'll be a R-ninja in no time – Amit Kohli Jan 26 '16 at 11:18
  • i guess aggregate would work, but carcinoid contains a lot more variables, how do i select the ones i want to have in my table? – John Eriksson Jan 26 '16 at 11:30
  • i used aggregate(carcinoid[, 43,28,31,33,17], list(carcinoid$met_any), sum), the numbers are column numbers, but I'm doing something wrong.. @DavidArenburg – John Eriksson Jan 26 '16 at 12:04
  • 2
    We don't have a reproducible example – David Arenburg Jan 26 '16 at 12:06
  • i added an example of my dataset, does that help? @DavidArenburg – John Eriksson Jan 26 '16 at 12:17
  • Ok, and what's the desired output? For instance, this works on your data set `aggregate(.~ met_any, df[-1], sum)` or `aggregate(.~ met_any, df[1:3], sum)`, depends on which columns you want to operate on. – David Arenburg Jan 26 '16 at 12:20
  • i added desired output @DavidArenburg – John Eriksson Jan 26 '16 at 12:28
  • I don't understand your desired output. It doesn't look like sums to me – David Arenburg Jan 26 '16 at 12:33
  • right, so "1" designates a yes answer, "2" designates a no answer. Summing them up can not work of course. I could remake "1" to "yes" and "2" to "no", for example: `carcinoid$regional_lymph_nodes_y_n[carcinoid$regional_lymph_nodes_y_n=='2'] <-'no'` would that work? @DavidArenburg – John Eriksson Jan 26 '16 at 12:38
  • Are you just looking for `aggregate(.~ met_any, df, length)`? – David Arenburg Jan 26 '16 at 12:40
  • `aggregate(.~ met_any, carcinoid, length)`? no, that does not help – John Eriksson Jan 26 '16 at 12:43
  • Or if you just want to count the `1`s per group, this should be `aggregate(df[-2] == 1, list(df$met_any), sum)` though I still fail to understand your desired output. Anyway, I'm reopening this as I have no idea what you want. Maybe someone else will. – David Arenburg Jan 26 '16 at 12:46
  • hmm, maybe `sapply(df[-1], table)` or `cbind(unique(df[2]), sapply(df[-1], table))` in order to add your other column. – David Arenburg Jan 26 '16 at 12:52
  • `aggregate(carcinoid[-2] == 1, list(carcinoid$met_any), sum)` gives me the number of 1:s in each column of my data. I can use those numbers and make a new table, so it solves my problem, even though i would have liked R to give me the table, so that i could recompute it if data changes. @DavidArenburg – John Eriksson Jan 26 '16 at 13:02
  • @AmitKohli. I tried to make my question easier to understand, any thoughts? – John Eriksson Jan 26 '16 at 13:19
  • @JohnEriksson we're struggling to see what you need. It's a step in the right direction (and common practice) to do `dput(carcinoid)` and then paste the results in your question. That way we can have your data and can play with it to give you a better solution. – Amit Kohli Jan 26 '16 at 14:02
  • hmm, haven't I already showed `sapply(df[-1], table)` or am I missing something? – David Arenburg Jan 26 '16 at 14:08

1 Answers1

1

Based on your example data, this data.table approach works:

library(data.table)
setDT(df)[,lapply(.SD,table),.SDcols=c(2:5)]

# met_any liver_mets_y_n regional_lymph_nodes_y_n pc
# 1:       4              1                        4  2
# 2:       1              4                        1  3
mtoto
  • 23,919
  • 4
  • 58
  • 71