How do I add a column to my data table that shows the sum of multiple other columns' values?

Question

I have 8 age categories, with each of them having its own column (i.e. residents_under_5, residents_6_to_12, etc. Each column has a value between 0 and 3, for the number of people in that household in that specific age category. What I want is a new column with which I can plot the total distribution of age of my population on a histogram. So I was thinking of a column that has 66 rows of residents_under_5, 32 rows of residents_6_to_12, etc., for the sum of those categories.

My data looks like this:

What I want is a column e that shows:

e
a
a
a
a
b
b
b
b
b
c
c
c
d
d
d

For the total number of occurences in the other columns.

I've tried declaring new columns with sum(residents_under_5), but that will give me 1 row with 66 (as the sum of that category). I can't plot a histogram with such a column. I hope someone can figure it out!

This is the dput() of the relevant columns

residents_under_5 = c(0, 0, 0, 1, 1, 2), 
residents_6_to_12 = c(0, 0, 0, 0, 0, 0), 
        residents_13_to_18 = c(0, 0, 0, 0, 0, 0), 
residents_19_to_24 = c(0, 
        0, 0, 0, 0, 0), 
residents_25_to_34 = c(0, 1, 2, 0, 1, 0), 
       residents_35_to_49 = c(0, 0, 0, 2, 1, 2), 
residents_50_to_64 = c(0, 
        1, 0, 0, 0, 0), 
residents_65_and_older = c(2, 0, 0, 0, 1, 
        0)

Something like this: https://stackoverflow.com/questions/31461357/sum-of-two-columns-of-data-frame-with-na-values? — AkselA, May 27 '19 at 12:20

Ronak Shah · Accepted Answer · 2019-05-27T13:55:50.390

1

You can unlist the dataframe and calculate frequencies using table and then repeat letters using rep.

rep(letters[seq_len(ncol(df))], colSums(df))

data

df <- data.frame(residents_under_5 = c(0, 0, 0, 1, 1, 2), 
                 residents_6_to_12 = c(0, 0, 0, 0, 0, 0), 
                 residents_13_to_18 = c(0, 0, 0, 0, 0, 0), 
                 residents_19_to_24 = c(0, 0, 0, 0, 0, 0), 
                 residents_25_to_34 = c(0, 1, 2, 0, 1, 0), 
                 residents_35_to_49 = c(0, 0, 0, 2, 1, 2), 
                 residents_50_to_64 = c(0, 1, 0, 0, 0, 0), 
                 residents_65_and_older = c(2, 0, 0, 0, 1, 0))

edited May 27 '19 at 13:55

answered May 27 '19 at 13:14

Ronak Shah

377,200
20
156
213

Does this also work for data table? I have more columns than just these 4, how do I specify only the relevant columns? Do I just write 'names', or what should go instead? thanks! – Ben Gill May 27 '19 at 13:25
@BenGill This is taking into consideration all the columns whether 4 or 100.You might need to subset the columns If you want to ignore them. For example, to ignore 1st column we might do `rep(names(df)[-1], table(unlist(df)[-1]))` – Ronak Shah May 27 '19 at 13:28
Apparently it doesn't work as I have negative values in my data set (error: invalid 'times' argument) – Ben Gill May 27 '19 at 13:37
@BenGill I create a new dataframe using the `dput` of your relevant columns and updated the answer, Can you check now? – Ronak Shah May 27 '19 at 13:44
it gives me 5000 rows of "a" – Ben Gill May 27 '19 at 13:50
@BenGill I tried re-reading your question and updated the answer. Make sure that `df` has only relevant columns. See if that works for you or else I'll delete the post since I am not able to help you. – Ronak Shah May 27 '19 at 13:57
Thank you so much! It worked now. I first created a new data table with only the relevant columns. Your code gave me a vector with the right values. I declared it into a new variable as data frame, and will now plot it on a histogram! – Ben Gill May 27 '19 at 15:42

akrun · Answer 2 · 2019-05-27T14:07:13.107

1

An option in tidyverse would be to get the sum of all columns with summarise_all, gather into 'long' format and uncount the 'value' column

library(tidyverse)
df1 %>% 
   summarise_all(sum) %>%
   gather %>% 
   uncount(value)

data

df1 <- structure(list(a = 0:3, b = c(3L, 3L, 0L, 1L), c = c(2L, 2L, 
2L, 0L), d = c(1L, 1L, 1L, 0L)), class = "data.frame", row.names = c(NA, 
  -4L))

edited May 27 '19 at 14:07

answered May 27 '19 at 14:00

akrun

874,273
37
540
662

How do I add a column to my data table that shows the sum of multiple other columns' values?

2 Answers2

data