making a table with multiple columns in r

Question

I´m obviously a novice in writing R-code. I have tried multiple solutions to my problem from stackoverflow but I'm still stuck.

My dataset is carcinoid, patients with a small bowel cancer, with multiple variables.

i would like to know how different variables are distributed

carcinoid$met_any - with metastatic disease 1=yes, 2=no(computed variable)
carcinoid$liver_mets_y_n  - liver metastases 1=yes, 2=no
carcinoid$regional_lymph_nodes_y_n  - regional lymph nodes 1=yes, 2=no
peritoneal_carcinosis_y_n  - peritoneal carcinosis 1=yes, 2=no

i have tried this solution which is close to my wanted result

ddply(carcinoid, .(carcinoid$met_any), summarize,
      livermetastases=sum(carcinoid$liver_mets_y_n=="1"),
      regionalmets=sum(carcinoid$regional_lymph_nodes_y_n=="1"),
      pc=sum(carcinoid$peritoneal_carcinosis_y_n=="1"))

with the result being:

  carcinoid$met_any livermetastases regionalmets pc
1                 1              21           46  7
2                 2              21           46  7

Now, i expected the row with 2(=no metastases), to be empty. i would also like the rows in the column carcinoid$met_any to give the number of patients.

If someone could help me it would be very much appreciated! John

Edit

My dataset, although the column numbers are: 1, 43,28,31,33
1=yes2=no

case_nr          met_any     liver_mets_y_n   regional_lymph_nodes_y_n pc
1                   1               1                  1                2
2                   1               2                  1                2               
3                   2               2                  2                2
4                   1               2                  1                1               
5                   1               2                  1                1

desired output - I want to count the numbers of 1:s and 2:s, if it works, all 1:s should end up in the met_any=1 row

           nr   liver_mets        regional_lymph_nodes        pc
met_any=1  4         1                    4                    2
met_any=2  1         4                    1                    3

EDIT

Although i probably was very unclear in my question, with your help i could make the table i needed!

setDT(carcinoid)[,lapply(.SD,table),.SDcols=c(43,28,31,33,17)]

gives

     met_any lymph_nod liver_met     paraortal            extrahep
1:      50      46       21              6               15
2:     111     115      140             151              146

i am very grateful! @mtoto provided the solution John

Several things. You're using `plyr', which is getting a bit outdated/obsolete so you might want to spend your time learning dplyr or data.table instead. If you want to stay with plyr: you split your dataset by met_any, but then accessed the entire dataset. Try what happens if you use `sum(liver_mets_y_n=="1"`. — Heroka, Jan 26 '16 at 11:12
i'm guessing this is not what you mean..dput(carcinoid$met_any) dput(carcinoid$met_any) c("1", "2", "1", "2", "2", "2", "2", "2", "1", "2", "1", "2", "2", "1", "2", "1", "2", "2", "2", "2", "1", "2", "2", "2", "2", "1", "2", "2", "2", "2", "1", "2", "2", "1", "2", "1", "2", "2", "2", "2", "1", "2", "2", "2", "2", "2", "2", "1", "2", "1", "2", "2", "1", "2", "2", "2", "1", "2", "2", "1", "2", "1", "1", "1", — John Eriksson, Jan 26 '16 at 11:14
I'm happy to use anything that works, if data.table is better I'll try that — John Eriksson, Jan 26 '16 at 11:15
Use `sum` instead of `mean` in the linked dupe and you good to go. — David Arenburg, Jan 26 '16 at 11:17
do dput(carcinoid) and then copypaste that output into your original answer (do this everytime you want help). I agree w/ Heroka's comment. Learn dplyr, no one uses plyr anymore. A good way to do so is to download the package "swirl", and do the dplyr lesson there... you'll be a R-ninja in no time — Amit Kohli, Jan 26 '16 at 11:18
i guess aggregate would work, but carcinoid contains a lot more variables, how do i select the ones i want to have in my table? — John Eriksson, Jan 26 '16 at 11:30
i used aggregate(carcinoid[, 43,28,31,33,17], list(carcinoid$met_any), sum), the numbers are column numbers, but I'm doing something wrong.. @DavidArenburg — John Eriksson, Jan 26 '16 at 12:04
i added an example of my dataset, does that help? @DavidArenburg — John Eriksson, Jan 26 '16 at 12:17
Ok, and what's the desired output? For instance, this works on your data set `aggregate(.~ met_any, df[-1], sum)` or `aggregate(.~ met_any, df[1:3], sum)`, depends on which columns you want to operate on. — David Arenburg, Jan 26 '16 at 12:20
I don't understand your desired output. It doesn't look like sums to me — David Arenburg, Jan 26 '16 at 12:33
right, so "1" designates a yes answer, "2" designates a no answer. Summing them up can not work of course. I could remake "1" to "yes" and "2" to "no", for example: `carcinoid$regional_lymph_nodes_y_n[carcinoid$regional_lymph_nodes_y_n=='2'] <-'no'` would that work? @DavidArenburg — John Eriksson, Jan 26 '16 at 12:38
Are you just looking for `aggregate(.~ met_any, df, length)`? — David Arenburg, Jan 26 '16 at 12:40
`aggregate(.~ met_any, carcinoid, length)`? no, that does not help — John Eriksson, Jan 26 '16 at 12:43
Or if you just want to count the `1`s per group, this should be `aggregate(df[-2] == 1, list(df$met_any), sum)` though I still fail to understand your desired output. Anyway, I'm reopening this as I have no idea what you want. Maybe someone else will. — David Arenburg, Jan 26 '16 at 12:46
hmm, maybe `sapply(df[-1], table)` or `cbind(unique(df[2]), sapply(df[-1], table))` in order to add your other column. — David Arenburg, Jan 26 '16 at 12:52
`aggregate(carcinoid[-2] == 1, list(carcinoid$met_any), sum)` gives me the number of 1:s in each column of my data. I can use those numbers and make a new table, so it solves my problem, even though i would have liked R to give me the table, so that i could recompute it if data changes. @DavidArenburg — John Eriksson, Jan 26 '16 at 13:02
@AmitKohli. I tried to make my question easier to understand, any thoughts? — John Eriksson, Jan 26 '16 at 13:19
@JohnEriksson we're struggling to see what you need. It's a step in the right direction (and common practice) to do `dput(carcinoid)` and then paste the results in your question. That way we can have your data and can play with it to give you a better solution. — Amit Kohli, Jan 26 '16 at 14:02
hmm, haven't I already showed `sapply(df[-1], table)` or am I missing something? — David Arenburg, Jan 26 '16 at 14:08

mtoto · Accepted Answer · 2016-01-26T14:04:12.793

1

Based on your example data, this data.table approach works:

library(data.table)
setDT(df)[,lapply(.SD,table),.SDcols=c(2:5)]

# met_any liver_mets_y_n regional_lymph_nodes_y_n pc
# 1:       4              1                        4  2
# 2:       1              4                        1  3

edited Jan 26 '16 at 14:04

answered Jan 26 '16 at 13:53

mtoto

23,919
4
58
71

hmm, maybe that works @mtoto, but I would need to only include columns 43,28,31,33,17 in the table, and preferably in that order – John Eriksson Jan 26 '16 at 14:01
@JohnEriksson then `.SDcols=c(43,28,31,33,17)` – mtoto Jan 26 '16 at 14:02
`setDT(carcinoid)[,lapply(.SD,table),.SDcols=c(43,28,31,33,17)]` works beautifully! I am very grateful for your help! @mtoto – John Eriksson Jan 26 '16 at 17:32

making a table with multiple columns in r

1 Answers1