I have 8 age categories, with each of them having its own column (i.e. residents_under_5, residents_6_to_12, etc. Each column has a value between 0 and 3, for the number of people in that household in that specific age category. What I want is a new column with which I can plot the total distribution of age of my population on a histogram. So I was thinking of a column that has 66 rows of residents_under_5, 32 rows of residents_6_to_12, etc., for the sum of those categories.
My data looks like this:
a b c d
0 3 2 1
1 3 2 1
2 0 2 1
3 1 0 0
What I want is a column e that shows:
e
a
a
a
a
b
b
b
b
b
c
c
c
d
d
d
For the total number of occurences in the other columns.
I've tried declaring new columns with sum(residents_under_5)
, but that will give me 1 row with 66 (as the sum of that category). I can't plot a histogram with such a column. I hope someone can figure it out!
This is the dput() of the relevant columns
residents_under_5 = c(0, 0, 0, 1, 1, 2),
residents_6_to_12 = c(0, 0, 0, 0, 0, 0),
residents_13_to_18 = c(0, 0, 0, 0, 0, 0),
residents_19_to_24 = c(0,
0, 0, 0, 0, 0),
residents_25_to_34 = c(0, 1, 2, 0, 1, 0),
residents_35_to_49 = c(0, 0, 0, 2, 1, 2),
residents_50_to_64 = c(0,
1, 0, 0, 0, 0),
residents_65_and_older = c(2, 0, 0, 0, 1,
0)