0

I'm summarising the data and I get different median values in the table created with dplyr package and boxplot (ggplot2). The sample data can be found here :

dplyr table

library(dplyr)
library(ggplot2)

sample2 = read.csv("sample2.csv")

sample2 %>%
group_by(category) %>%
summarise(median_avg=median(avg_value), median_total = (median(total_value)))

the result is 307 for 3+ category

# A tibble: 3 × 3
category median_avg median_total
 <chr>      <dbl>        <dbl>
1        1     17.500        37.07
2        2     16.830       117.48
3       3+     17.375       306.95

However, when I try to visualise it in boxplot, I get different median for 3+ category, below 200:

boxplot

sample2 %>%
  ggplot(aes(category, total_value)) + geom_boxplot() +
  scale_y_continuous(limits = c(0,500))

enter image description here

I tried this using dummy data and there's no discrepancy between the table and the boxplot, any ideas what causes the problems in this particular dataset? Thanks for your help! Any ideas

Community
  • 1
  • 1
Kasia Kulma
  • 1,683
  • 1
  • 14
  • 39
  • 3
    Are all of the values in your data between 0 and 500? `scale_y_continuous(limits = c(0,500))` will cause any data outside this range to be excluded from the calculations. To include all data, but still set the y-range for the graph, do `coord_cartesian(ylim=c(0,500))` – eipi10 Sep 12 '16 at 20:56
  • this is it, thanks! make it an answer and I'll happily upvote it :) – Kasia Kulma Sep 12 '16 at 20:58
  • This question is a duplicate, so I'll mark it as such in a moment and link to the previous answer. – eipi10 Sep 12 '16 at 21:01
  • well, I didn't know I was removing the points rather than just limiting the scale before I posted this question, does it still count as a duplicate? ;) Thanks again for help! – Kasia Kulma Sep 12 '16 at 21:10
  • 2
    @ProcrastinatusMaximus, I think [this one](http://stackoverflow.com/questions/29304712/ggplot2-boxplot-medians-arent-plotting-as-expected?rq=1) is actually a better duplicate, or should at least be added to the other one. – eipi10 Sep 12 '16 at 21:55
  • @eipi10 I could reopen it and then ask two users without a hammer to mark it dupe, after that you can hammer it. Another option is to add a more explicit comment with you link (will do so anyway after this comment) – Jaap Sep 13 '16 at 06:17
  • @eipi10 Could you close it again with your [**link**](http://stackoverflow.com/questions/29304712/ggplot2-boxplot-medians-arent-plotting-as-expected)? – Jaap Sep 14 '16 at 09:00

0 Answers0