-4

I am trying to make a chart in ggplot but R has a stupid sorting method. Instead of making biggest number it sorts bu first numer 1-9 so for example 100k is below 2. Could someone tell me how can I fix this?

    ggplot(AWD,aes(nationality, discipline, size = money)) + 
    geom_point() + 
    theme(text = element_text(size=25),axis.text.x = element_text(angle=90, 
    hjust=1, vjust=0.4))+
    labs(title = "Roznica w wielkosci zarobkow sportowcow roznych dziedzin 
    dla kazdego panstwa", x="Reprezentowane panstwo", y="Rodzaj sportu", 
    fill="Poziom zarobow") 

Here is the reproducible example:

AWD <- data.frame(name = c("Aaron Donald", "Aaron Rodgers", "Albert Pujols", "Alexis SA¡nchez"),
                  nationality = c("Argentyna", "Brazylia", "Chile", "Dominikana"),
                  discipline = c("Baseball", "Boks", "Formula 1", "Futbol amerykanski"),
                  money = c("41,400,000", "89,300,000", "100,000,000", "30,700,000")) 
AWD$money <- as.factor(AWD$money)

How chart looks like

Edit:

I took the liberty to cut down the reproducible example code & rename several variables. The resulting data.frame is identical to the original one with one exception, with the code being somewhat more readable. It also does not mess up my RStudio layout.

The exception is adding a value that illustrates the issue that happens when ggplot displays the factor.

The rename was mostly for the reason I do not speak Polish. It would take me time to write the used variables correctly each time. I will provide answer shortly, it should work with this edit as is. Otherwise will require some minor tweaking.

Community
  • 1
  • 1
  • 3
    That is because your variable is taken as a factor. Try removing the `as.factor` and see if that solves the problem. – Jonathan V. Solórzano Jan 03 '20 at 03:19
  • 5
    It's not stupid, it's just not reading your mind. You haven't done anything that would make anything be sorted by count. It seems a weird choice to make the size a factor—don't you want that to be a continuous variable? To help much more, we'd need to see a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – camille Jan 03 '20 at 03:26
  • I deleted as.factor and it changed nothing chart looks exacly the same. –  Jan 03 '20 at 10:51
  • I like how people give dislikes but won't tell you how can you fix your problem ehhh –  Jan 03 '20 at 11:01
  • Does anyone know how to help me? –  Jan 03 '20 at 15:16

1 Answers1

0

The reason for the observed behavior is the fact, that the factor levels are handled as string. Because of that, the sort is done in an alphabetical order. This results in "100" being before "99" in an ascending order.

The workaround was a bit tricky, I have used the stringr package for easier manipulation of the string. The rest is plain R. There might be more elegant way with dplyr or similar package if you don't mind the additional dependencies.

Since my edit is not yet visible, here is the data I used as a baseline:

    AWD <- data.frame(
                  name = c("Aaron Donald", "Aaron Rodgers", "Albert Pujols", "Alexis SA¡nchez"),
                  nationality = c("Argentyna", "Brazylia", "Chile", "Dominikana"),
                  discipline = c("Baseball", "Boks", "Formula 1", "Futbol amerykanski"),
                  money = c("41,400,000", "89,300,000", "100,000,000", "30,700,000")) 
    AWD$money <- as.factor(AWD$money)

The solution being this:

    newOrder <- order(as.numeric(str_replace_all(levels(AWD$money), ",","")))
    levels(AWD$money) <- levels(AWD$money)[newOrder]

The str_replace_all is necessary, because R as.numeric doesn't like the commas in the original values. After this the plot should work as intended.

Side note: Working with the original reproducible example was a pain. Please try to cut down the code to the bare minimum next time please.

Shamis
  • 2,544
  • 10
  • 16