0

I'm trying to plot the number of exoplanets discovered per year, grouped by the method of discovery. My data is of this kind: planet|discovery method|year

I have this command:

ggplot(data, aes(x=pl_disc, fill=pl_discmethod)) + 
  geom_bar() + 
  scale_fill_brewer(palette="RdBu") +
  labs(x="Years", y="Count", 
       title="Number of exoplanets discovered per year",  
       fill="Discovery method")+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +   
  scale_x_continuous("Years", labels=data$pl_disc, breaks=data$pl_disc)

And getting this plot:

enter image description here

I need a little different version of this plot.

  1. The years on the x-axis are incomplete. When there aren't values to plot for a given year, the year is not shown. I want all the years to be shown
  2. I also need to show the sum of the elements in each column. I tried this:
ggplot(data, aes(x=pl_disc, fill=pl_discmethod)) + 
  geom_histogram(binwidth = 1) + 
  scale_fill_brewer(palette="RdBu") + 
  stat_bin(aes(label=..count..),vjust=-1, geom="text", binwidth = 1) + 
  labs(x="Years", y="Count", 
       title="Number of exoplanets discovered per year", 
       fill="Discovery method")

But the values are overwritten, as seen in this plot:

enter image description here How can I have the complete scale of years and the count of each column?

leinaD_natipaC
  • 4,299
  • 5
  • 21
  • 40
Amvd
  • 1
  • 2
  • 1
    About the incomplete x axis: please check, which type pl_disc is - probably it's character or factor - so you might have to convert to date. And, a hint: here are some ideas how to prepare questions: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Wolfgang Arnold Mar 17 '20 at 12:27
  • pl_disc is the year, as is shown in the x axis – Amvd Mar 17 '20 at 13:07
  • What does ```typeof(data$pl_disc)``` say? – Wolfgang Arnold Mar 17 '20 at 13:22
  • Is an integer.- – Amvd Mar 17 '20 at 13:37
  • That seems OK - the reason then might be that you've specified the breaks to be from your source data ```scale_x_continuous("Years", labels=data$pl_disc, breaks=data$pl_disc)```- so, if there are years missing, they don't show up. If you ```dput(data)``` or a subset ```dput(data[10,]```) it would be easier to re-run your code. – Wolfgang Arnold Mar 17 '20 at 14:27
  • Couldn't make in work. I obtain the same plot using dput. But how can I get the count of each column? – Amvd Mar 17 '20 at 18:25
  • Sorry I was too short in comment:with dput I meant: please make a dump of your data and add to your question, then others can copy it easily into their R and try to run your code. – Wolfgang Arnold Mar 18 '20 at 07:24
  • I tried to reproduce your plot with some example data - I didn't get the effect with the missing years. For the labels you could try ```stat_bin(aes(label=..count..), geom="text", binwidth = 1, position=position_stack(vjust = 0.5))``` – Wolfgang Arnold Mar 18 '20 at 07:39
  • I already tried that for the counts and the values are overwriten in the columns (I attached an image) and I run:ggplot(data, aes(x=pl_disc, fill=pl_discmethod)) + geom_bar() + scale_fill_brewer(palette="RdBu")+labs(x="Years", y="Count", title="Number of exoplanets discovered per year", fill="Discovery method")+theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) + scale_x_continuous("Years", labels=data$pl_disc, breaks=data$pl_disc)+stat_bin(aes(label=..count..), geom="text", binwidth = 1, position=position_stack(vjust = 0.5)) – Amvd Mar 18 '20 at 11:57
  • This is part of the data: loc_rowid pl_name pl_discmethod pl_disc 1 11 Com b Radial Velocity 2007 2 11 UMi b Radial Velocity 2009 3 14 And b Radial Velocity 2008 4 14 Her b Radial Velocity 2002 5 16 Cyg B b Radial Velocity 1996 6 18 Del b Radial Velocity 2008 7 1RXS J160929.1-210524 b Imaging 2008 – Amvd Mar 18 '20 at 12:00

0 Answers0