4

I have a data frame that contains 4 variables: an ID number (chr), a degree type (factor w/ 2 levels of Grad and Undergrad), a degree year (chr with year), and Employment Record Type (factor w/ 6 levels).

I would like to display this data as a count of the unique ID numbers by year as a stacked area plot of the 6 Employment Record Types. So, count of # of ID numbers on the y-axis, degree year on the x-axis, the value of x being number of IDs for that year, and the fill will handle the Record Type. I am using ggplot2 in RStudio.

I used the following code, but the y axis does not count distinct IDs:

ggplot(AlumJobStatusCopy, aes(x=Degree.Year, y=Entity.ID,
       fill=Employment.Data.Type)) + geom_freqpoly() +
       scale_fill_brewer(palette="Blues",
       breaks=rev(levels(AlumJobStatusCopy$Employment.Data.Type)))

I also tried setting y = Entity.ID to y = ..count.. and that did not work either. I have searched for solutions as it seems to be a problem with how I am writing the aes code.

I also tried the following code based on examples of similar plots:

ggplot(AlumJobStatusCopy, aes(interval)) + 
      geom_area(aes(x=Degree.Year, y = Entity.ID, 
                    fill = Employment.Data.Type)) +
      scale_fill_brewer(palette="Blues",
          breaks=rev(levels(AlumJobStatusCopy$Employment.Data.Type)))

This does not even seem to work. I've read the documentation and am at my wit's end.

EDIT:

After figuring out the answer to the problem, I realized that I was not actually using the correct values for my Year variable. A count tells me nothing as I am trying to display the rise in a lack of records and the decline in current records.

My Dataset:
Year, int, 1960-2015
Current Record, num: % of total records that are current
No Record, num: % of total records that are not current

Ergo each Year value has two corresponding percent values. I am now using 2 lines instead of an area plot since the Y axis has distinct values instead of a count function, but I would still like the area under the curves filled. I tried using Melt to convert the data from wide to long, but was still unable to fill both lines. Filling is just for aesthetic purposes as I would like to use a gradient for each with 1 fill being slightly lighter than the other.

Here is my current code:

ggplot(Alum, aes(Year)) + 
    geom_line(aes(y = Percent.Records, colour = "Percent.Records")) +
    geom_line(aes(y = Percent.No.Records, colour = "Percent.No.Records")) + 
    scale_y_continuous(labels = percent) + ylab('Percent of Total Records') + 
    ggtitle("Active, Living Alumni Employment Record") +
    scale_x_continuous(breaks=seq(1960, 2014, by=5))

I cannot post an image yet.

Gryu
  • 2,102
  • 2
  • 16
  • 29
KWalker
  • 179
  • 1
  • 2
  • 8
  • 1
    Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we can test the code and possible solutions. This will make it easier to help you, – MrFlick Jul 31 '15 at 22:57
  • I can't test it, because you didn't provide any data, but try: `ggplot(AlumJobStatusCopy, aes(x=Degree.Year, y=Entity.ID, fill=Employment.Data.Type)) + geom_bar(stat="identity")` That should give you a stacked bar. – RHA Aug 01 '15 at 08:31
  • @RHA, that code produces bars with one slice for each ID, colored by employment type. I think s/he wants a count of IDs by employment type. To get there with `y=Entity.ID`, I think you'd have to order the IDs by employment type. – ulfelder Aug 01 '15 at 11:52

1 Answers1

7

I think you're missing a step where you summarize the data to get the quantities to plot on the y-axis. Here's an example with some toy data similar to how you describe yours:

# Make toy data with three levels of employment type
set.seed(1)
df <- data.frame(Entity.ID = rep(LETTERS[1:10], 3), Degree.Year = rep(seq(1990, 1992), each=10),
    Degree.Type = sample(c("grad", "undergrad"), 30, replace=TRUE),
    Employment.Data.Type = sample(as.character(1:3), 30, replace=TRUE))

# Here's the part you're missing, where you summarize for plotting 
library(dplyr)
dfsum <- df %>%
    group_by(Degree.Year, Employment.Data.Type) %>%
    tally()

# Now plot that, using the sums as your y values
library(ggplot2)
ggplot(dfsum, aes(x = Degree.Year, y = n, fill = Employment.Data.Type)) +
    geom_bar(stat="identity") + labs(fill="Employment")

The result could use some fine-tuning, but I think it's what you mean. Here, the bars are equal height because each year in the toy data include an equal numbers of IDs; if the count of IDs varied, so would the total bar height. enter image description here

If you don't want to add objects to your workspace, just do the summing in the call to ggplot():

ggplot(tally(group_by(df, Degree.Year, Employment.Data.Type)),
    aes(x = Degree.Year, y = n, fill = Employment.Data.Type)) +
    geom_bar(stat="identity") + labs(fill="Employment")
ulfelder
  • 5,305
  • 1
  • 22
  • 40