-1

Inspired by this question, I would like to create a 100 % stacked area plot with ggplot2 showing movies by years ordered by country. My data frame can be retrieved here. I have two variable year and country. I know if have an error in thinking but I cannot get the solution.

The code I use is:

library(reshape)
library(ggplot2)

df <- read.csv(url("https://dl.dropboxusercontent.com/u/109495328/movie_db.csv"))
ggplot(df, aes(x=Year,y=Country,group=Country,fill=Country)) + geom_area(position="fill")

My graph looks like this:

enter image description here

But supposed to look somehow like this (example plot):

enter image description here

What am I missing?

Edit:

Axeman, I do not understand how you get your Freq variable, even with your updated solution?

I am not sure if this is necessary or if ggplot is doing this "automaticcaly" but I think the actual issue I have is to convert my dataframe above to a dataframe understands how often a country appears each year and saves it a frequency:

From:

year country
2015 US
2015 US
2014 UK
2015 UK
2014 US
.
.
.

To:

year country freq
2015 US      6
2015 UK      7
2014 US      10
2014 UK      2
Community
  • 1
  • 1
Til Hund
  • 1,543
  • 5
  • 21
  • 37
  • 1
    You're missing a good explanation of what your graph is supposed to show. Your example has a continuous y-axis, your code has a factor as y. Also check `range(df$Year)`. – Axeman Dec 12 '15 at 21:05
  • 1
    Does using `stat = "identity"` and `position = "stack"` inside `geom_area` help? See [here for an example](http://stackoverflow.com/questions/22945651/how-to-remove-space-between-axis-area-plot-in-ggplot2) – Jaap Dec 12 '15 at 21:10
  • @ Jaap, it does not function. It looks similar to the graphical output, I show above. @ Axeman, I would like to show how many movies are in the data frame by year in comparison with countries in which they were produced. I have the feeling that I have to add a third variable only showing `1` in each line. Axeman, what do you suggest? – Til Hund Dec 12 '15 at 21:24

1 Answers1

1

Still a bit unsure about what you want, but here is my attempt:

#load some libraries
library(dplyr)
library(tidyr)

#get rid of some clear errors in your supplied data
df <- filter(df, Country != '')
df <- droplevels(df)

#now pre-calculate the proportion for each country each year summing up to one.
#note that it may be more useful to have actual counts here instead of 0 or 1.
df2 <- table(Year = df$Year, Country = df$Country) %>% prop.table(1) %>% as.data.frame()
#fix year into a numeric
df2$Year <- as.numeric(as.character(df2$Year))

#make the plot
ggplot(df2, aes(x=Year,y=Freq,group=Country,fill=Country)) + 
  geom_area(alpha = 1) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0))

enter image description here

If you don't want them to sum to one, use this instead:

df3 <- table(Year = df$Year, Country = df$Country) %>% as.data.frame()
#fix year into a numeric
df3$Year <- as.numeric(as.character(df3$Year))

#make the plot
ggplot(df3, aes(x=Year,y=Freq,group=Country,fill=Country)) + 
  geom_area(alpha = 1) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0))

enter image description here

Axeman
  • 32,068
  • 8
  • 81
  • 94
  • Axeman, thank you very much for your attempt! We are quite close! I found my mistake: It is not a 100 % area plot but "just" an area plot (I adjusted the title accordingly). The y axe suppose to show the sum of movies each year. For example, off all movies in 2015, let's suppose 100, should be shown there. For 2014, however, there are only 50. So 50 should be indicated for 2014. I thought that `ggplot` retrieves the number, i. e. 100 for 2015 and 50 for 2014, by reading how many times 2015 appears in the data frame and saves that in the variable you called `Freq`. How can I achieve that? – Til Hund Dec 12 '15 at 22:41
  • Right, sorry, your example image threw me off. I think you want to use this line instead: `df2 <- table(Year = df$Year, Country = df$Country) %>% as.data.frame()` (without `prop.table`). – Axeman Dec 13 '15 at 13:56
  • Axeman, thank you very much for your reply. Please, note my edit above. – Til Hund Dec 17 '15 at 17:24