1

I've used the following way to create 3 histograms. The 4th one has suddenly a reverse order on the x-axis. However, there's nothing (at least nothing I know about) in the snippet that should affect the order.

The x-axis is expected to start with the lowest value on the left

The x-axis is expected to start with the lowest value on the left.

Here's the R code:

df <- mydata %>% mutate(length.class=cut(mydata$count,breaks = c(1,10,100,1000,10000,100000,1000000,10000000),include.lowest=TRUE,dig.lab=8)) %>% group_by(length.class) %>% summarise(count = n())
dftext <- as.data.frame(table(df$length.class))
colnames(dftext)[1] <- "x"
dftext$lab[dftext$x == "[1,10]"] <- 1063393
dftext$lab[dftext$x == "(10,100]"] <- 65986
dftext$lab[dftext$x == "(100,1000]"] <- 3206
dftext$lab[dftext$x == "(1000,10000]"] <- 386
dftext$lab[dftext$x == "(10000,100000]"] <- 32
dftext$lab[dftext$x == "(100000,1000000]"] <- 0
dftext$lab[dftext$x == "(1000000,10000000]"] <- 1

df$count[df$length.class == "(1000000,10000000]"] <- 1.1  // To make its bar visible

fmt <- function(decimals=0){
    function(x) format(x,scientific = FALSE)
}

ggplot(df,aes(length.class,count)) + geom_bar(stat = "identity",width=0.9,fill="#999966") + scale_y_log10(labels = fmt()) + labs(x="", y="") + geom_text(data=dftext, aes(x=x, y=2, label=lab), size = 6) + theme(text = element_text(size=20)) +
    theme(axis.line = element_line(colour = "black"),
          panel.grid.major = element_line(color = "grey"),
          panel.grid.minor = element_line(color = "grey"),
          panel.background = element_blank(),
          axis.title.x = element_text(margin=margin(t = 15, unit = "pt")),
          axis.text.x = element_text(angle = 45, hjust = 1))

What is causing the reverse order and how can I get rid of it?

Edit: You guys are fast! :) The answer of @mark-peterson looks pretty solid, however I didn't get any working results with it though. Here's the requested data: mydata.csv

haggis
  • 407
  • 2
  • 20
  • 3
    It would help if you gave a sample `mydata` so your code will work – Pierre L Sep 16 '16 at 18:49
  • In the meantime, try adding `scale_x_reverse()`. It's ready-made for this. This might be [the best duplicate](http://stackoverflow.com/questions/29127035/ggplot-reflect-plot-about-y-axis/29127211#29127211) – Pierre L Sep 16 '16 at 18:59
  • I get a "Harmful Programs" warning when I go to download the data. Can you `dput` a subset that results in the plotting error instead? – Mark Peterson Sep 16 '16 at 19:48
  • I uploaded it to another site: https://ufile.io/dd056 It's just a textfile. Does it help? – haggis Sep 16 '16 at 19:58
  • I did get it from there -- I like helping on SO, I am just not willing to risk a malicious site to do it ;) I did get my solution working on your data, though it still requires @aosmith 's answer to get the right sorting. – Mark Peterson Sep 16 '16 at 20:10

2 Answers2

3

Your two datasets have the same levels of the factors length.class and x, but there is no row for (100000,1000000] in your first dataset, df. This is because summarise has no drop = FALSE option to keep all levels of a factor in the dataset regardless of if they have any observations.

As you built your plot using the dataset with fewer factors in the rows, it looks like ggplot2 gets confused when you add the new layer that has more factor levels and things get ordered oddly.

A fix is to make sure the x axis doesn't drop any factor levels by using drop = FALSE in scale_x_discrete. That way you will be working with the same factor levels for the x axis for both datasets and things won't get mis-ordered.

+ scale_x_discrete(drop = FALSE)
aosmith
  • 34,856
  • 9
  • 84
  • 118
  • So simple and it works! I didn't think about it to be a problem, as it wasn't one before. Maybe because the labeling changed. Thanks for helping me again. You should consider writing a R book ;) – haggis Sep 16 '16 at 19:50
1

When given text labels, geom_bar converts to a factor and sorts the bars. My guess it that alphabetical and numerical matched up for your previous uses, but did not for this one. I thought that @Pierre was right about scale_x_reverse(), but it doesn't appear to work on factors. Instead, you will need to set the factor orders yourself. Without sample data, it is hard to help do that.

A better question, however, is why you are doing so much work by hand here. The tools exist to automate much of your set up, with the added benefit of reducing errors and sorting the factor correctly. For example, with some reproducible data:

temp <- data.frame(a = 1:999)

temp$binned <-
  cut(temp$a, 10^(0:3), include.lowest = TRUE)

forText <-
  table(temp$binned) %>%
  as.data.frame()

ggplot(temp, aes(x = binned)) +
  geom_bar() +
  geom_text(data = forText
            , aes(x = Var1
                  , y = 75
                  , label = Freq))

enter image description here

If you just want a picture of the distribution, you can be even faster with a histogram:

ggplot(temp, aes(a)) +
  geom_histogram() +
  scale_x_log10()

enter image description here

(Also, in the future, try to strip down to an MWE -- no need to include lots of theme settings if they are not germane to the problem.)

Using the posted data, I got the plot to work with my approach above. Note that you would need to add the additional theme and scale arguments. You also need to make use of @aosmith's answer about the missing value. (Which, I think, means that @aosmith's answer actually answers your question, while mine may be just good advice for how to do this more quickly.)

mydata$binned <-
  cut(mydata$count,breaks = c(1,10,100,1000,10000,100000,1000000,10000000),include.lowest=TRUE,dig.lab=8)

forText <-
  table(mydata$binned) %>%
  as.data.frame()

ggplot(mydata, aes(x = binned)) +
  geom_bar() +
  geom_text(data = forText
            , aes(x = Var1
                  , y = 75
                  , label = Freq)) +
  scale_x_discrete(drop = FALSE)
Mark Peterson
  • 9,370
  • 2
  • 25
  • 48
  • Thank you! Unfortunately it doesn't work. Your short cut command creates a 3rd column on my data frame which has 3 NAs for [1,10] until (100,1000] and then starts with (100,1e+03] for the row (1000,100000] down to [1,10] for the highest row (1m,10m]. – haggis Sep 16 '16 at 19:46
  • What do you mean by MWE? Google couldn't help me on that. – haggis Sep 16 '16 at 19:46
  • MWE is a Minimal Working Example. Here is the [Stack Overflow description](http://stackoverflow.com/help/mcve), and [Here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) is the R specific information. The cut points that I used are specific to my example data. If you have lower (or higher) values, you will need to add them - you should be able to use the same breaks as in your original answer. – Mark Peterson Sep 16 '16 at 19:51