0

I am trying to plot trends in the age of university applicants. From various databases I use the data to build the following dataframe:

> AgeGroup <- c("Year", "17","18","19","20", "21", "22", "23", "24", "25to29", "30to39", "40plus"); AgeGroup
 [1] "Year"   "17"     "18"     "19"     "20"     "21"     "22"     "23"     "24"    
[10] "25to29" "30to39" "40plus"

> AGEgroups <- as.data.frame(cbind(a,h,i,j, k, l, m, n, o, p, q, r)); AGEgroups
  a    h      i      j     k     l     m     n    o     p     q     r
1  2004 1053 160450  74600 25778 14317  9761  6995 5589 15902 17171  8351
2  2005 1115 175406  77751 28368 15191 10551  7778 6107 18153 18695  9686
...
9  2012  743 199213  93669 37214 21240 14651 10962 8781 26387 27246 15308
10 2013  702 201821 103356 39185 21557 15242 11226 8707 27326 26887 15442

> colnames(AGEgroups) <- AgeGroup
> AGEgroups

   Year   17     18     19    20    21    22    23   24 25to29 30to39 40plus
1  2004 1053 160450  74600 25778 14317  9761  6995 5589  15902  17171   8351
...

10 2013  702 201821 103356 39185 21557 15242 11226 8707  27326  26887  15442

Then I plot the graph using the ggplot2 library:

> ggplot(AGEgroups,aes(x=Year, y=NumerOfApplicants, fill=Age.Range)) +
+   geom_area(data = AGEgroups, aes(x=Year, y=h, fill="17 yrs"))+
+   geom_area(data = AGEgroups, aes(x=Year, y=i, fill="18 yrs"))+
+   geom_area(data = AGEgroups, aes(x=Year, y=j, fill="19 yrs"))+

...

And receive a graph, which generally looks ok (though I tried to customise the colours and failed and though you cannot see it as I do not have enough reputation points), but... only 5 age groups get plotted instead of 11...

When I try to plot them separately using:

ggplot(AGEgroups,aes(x=Year, y=NumerOfApplicants, fill=Age.Range)) +
  geom_area(data = AGEgroups, aes(x=Year, y=l, fill="21 yrs"))

the majority work fine, but then when I plot:

ggplot(AGEgroups,aes(x=Year, y=NumerOfApplicants, fill=Age.Range)) +
  geom_area(data = AGEgroups, aes(x=Year, y=m, fill="22 yrs"))

which is the missing group, I get the error message:

Error: unexpected numeric constant in:
"ggplot(AGEgroups,aes(x=Year, y=NumerOfApplicants, fill=Age.Range)) +
  geom_area(data = AGEgroups, aes(x=Year, y=m, fill="22"

I have been looking at both code lines and can see no difference in the syntax. the 'm' vector gets displayed on command. Any ideas why it might be happening?

I do not get the unexpected numeric constant error today after restarting the computer, which means the old "switch on/off" technique solves at least 50% of problems;)

Still, the graph displays 5 instead of 11 variables. The suggested dput(head(AGEgroups)) yields the following output:

structure(list(Year = 2004:2009, `17` = c(1053L, 1115L, 937L,
1023L, 1273L, 1236L), `18` = c(160450L, 175406L, 173806L, 176306L, 
187802L, 197090L), `19` = c(74600L, 77751L, 71285L, 83706L, 89462L, 
97544L), `20` = c(25778L, 28368L, 27003L, 29955L, 36255L, 38451L
), `21` = c(14317L, 15191L, 15464L, 16550L, 19745L, 22110L), 
`22` = c(9761L, 10551L, 10287L, 11498L, 13384L, 15132L),
`23` = c(6995L, 7778L, 7664L, 8054L, 9801L, 11080L), `24` = c(5589L,
6107L, 5948L, 6150L, 7470L, 8810L), `25to29` = c(15902L,
18153L, 18001L, 18833L, 23578L, 27299L), `30to39` = c(17171L,
18695L, 17818L, 17861L, 22643L, 26781L), `40plus` = c(8351L, 
9686L, 9854L, 10141L, 13183L, 15888L)), .Names = c("Year", 
"17", "18", "19", "20", "21", "22", "23", "24", "25to29", "30to39",
"40plus"), row.names = c(NA, 6L), class = "data.frame")
Asiack
  • 47
  • 8
  • A lot of this doesn't make any sense. Like why you specify `y= NumerOfApplicants` and `fill=Age.Range` when those variables don't seem to be assigned anywhere. When why you are using `y=m` in the geom_area commands even though you're renamed all the columns. And what exactly did you expect `fill="22 years" to do exactly, that's not a variable name. But i suppose the reason that you're only seeing 5 is that they are covering each other up because you are plotting them as separate layers. You really need to melt your data. Maybe had `dput(head(AGEgroups))` to make this example reproducible. – MrFlick Jun 28 '14 at 01:33
  • The suggestion that the variables are covering each other seems plausible. Will investigate. I included the dput outcome in the post. – Asiack Jun 28 '14 at 14:14
  • The nonsensical things do not make sense but they work for me: 1. somehow the y=NumberOfApplicants assignes the title to the y axis. I changed it to "Number of Applicants" now. The standard formula for it gets ignored by my RStudio and if I do not include it "h" letter gets displayed there. 2. If I write y=22, which is the variable name, it gets treated as a number and you can see 22 on y axis. With "m" it works fine. 3. f="22 years" gives name on the legend bar to each color. – Asiack Jun 28 '14 at 14:24

1 Answers1

1

I still can't get your code above to run because it's missing all the single-letter variables and I don't want to define those manually so I can't reproduce the error.

But a better way to plot your data would be to melt it first.

library(reshape2)
mm<-melt(AGEgroups, id.vars="Year")

then plot with

ggplot(mm,aes(x=Year, y=value, fill=variable)) +
  geom_area() + ylab("Number of Applicants") + 
  scale_fill_hue(name = "Age Range", 
    labels=c(paste(17:24, "yrs"),"25 to 29", "30 to 39", "40+"))

which produces

enter image description here

Here we clearly label the plot using the more standard assignments rather than relying on the side effects of using imaginary variables in the aesthetics. This make this intention of the code much clearer.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thank you! It works now. As soon as I have 15 reputation points, I will vote your answer up! – Asiack Jun 28 '14 at 19:01