2

I'm looking for help to sort/order my geom_col() bar plot in Rstudio. Nothing I tried worked for me. Any help would be most appreciated.

This question was marked duplicate by Gregor, however, non of the answers to the question in the referenced link work here.

I have the following 3 column file with headers which I am trying to sort (I'm only showing the 1st 8 rows):

POPULATION  EXCESS_ALLELE_MATCHES_WITH_MBUTI    GROUP
Jordanian   1,059                               W Asians
BedouinB    937                                 W Asians
Saudi       894                                 W Asians
GujaratiD   835                                 S/SC Asians
Druze       722                                 W Asians
Iran_Fars   704                                 W Asians
Pathan      660                                 S/SC Asians

Here is my R code which works fine, except I'm not able to sort:

test <- read.csv(file_name, sep="\t", stringsAsFactor = FALSE, 
             header = TRUE)

ggplot(test, aes(x=POPULATION, y=EXCESS_ALLELE_MATCHES_WITH_MBUTI, fill=GROUP)) + 
 geom_col() + 
 coord_flip()

Click below to see my outputted unsorted barplot

Output barplot

BroVic
  • 979
  • 9
  • 26
Gene100
  • 121
  • 1
  • 4
  • 10

1 Answers1

2

You need to turn the population variable to a factor variable, and then re-level that variable in the order you want. The forcats package is particularly useful for this:

library(tidyverse)

test <- tribble(
  ~POPULATION,  ~EXCESS_ALLELE_MATCHES_WITH_MBUTI,    ~GROUP,
  "Jordanian" ,  1059,   "W Asians",
  "BedouinB",    937, "W Asians",
  "Saudi",   894, "W Asians",
  "GujaratiD",   835, "S/SC Asians",
  "Druze",   722, "W Asians",
  "Iran_Fars",   704, "W Asians",
  "Pathan",  660 ,"S/SC Asians"
)

test$POPULATION <- factor(test$POPULATION) %>%
  fct_reorder(test$EXCESS_ALLELE_MATCHES_WITH_MBUTI)

ggplot(test, aes(x=POPULATION, y=EXCESS_ALLELE_MATCHES_WITH_MBUTI, fill=GROUP)) + 
  geom_col() + coord_flip()

enter image description here

Phil
  • 7,287
  • 3
  • 36
  • 66
  • Thanks Phil, but I got the following error message after I ran your code: Error: `fun` must return a single number per group – Gene100 Jan 08 '18 at 00:27
  • I'm unable to reproduce that error. Maybe restart your session and try again? I've updated my answer with the full code. – Phil Jan 08 '18 at 01:19
  • Thanks Phil, but the error persists after restarting. It may have to do with how I'm loading the file: test <- read.csv(file_name, sep="\t", stringsAsFactor = FALSE, header = TRUE). If possible can you please load the file instead of manually entering the data. I believe you can copy and paste my sample file into a text file. Thanks in advance. – Gene100 Jan 08 '18 at 01:46
  • After you load the file, what types of variables are you getting by using `str(test)`? – Phil Jan 08 '18 at 01:48
  • This is what I get: str(test) 'data.frame': 29 obs. of 3 variables: $ POPULATION : chr "Jordanian" "BedouinB" "Saudi" "GujaratiD" ... $ EXCESS_ALLELE_MATCHES_WITH_MBUTI: chr "1,059" "937" "894" "835" ... $ GROUP : chr "W Asians" "W Asians" "W Asians" "S/SC Asians" ... – Gene100 Jan 08 '18 at 02:01
  • Your numeric variable is set as a character variable. Use `as.numeric()` to turn it to numeric. – Phil Jan 08 '18 at 02:02
  • Thanks Phil, I used this to convert column 2: test$EXCESS_ALLELE_MATCHES_WITH_MBUTI <- as.numeric(as.character(test$EXCESS_ALLELE_MATCHES_WITH_MBUTI)). Everything looks good except the POPULATION with the highest value, "Jordanian" was omitted from the plot. Any ideas? – Gene100 Jan 08 '18 at 02:24
  • I'd suspect you somehow removed it from the dataset. – Phil Jan 08 '18 at 02:34
  • I checked it is still there but was converted to N/A since it was the 2nd row in the table (1st row is header). No idea why that was the only row converted to N/A. I used the above to convert column 2 to numeric – Gene100 Jan 08 '18 at 02:42
  • I suspect the comma is the issue. Run `stringr::str_replace(",", "")` before using `as.numeric()`. – Phil Jan 08 '18 at 02:45
  • `stringr::str_replace(test$EXCESS, ",", "")` – Phil Jan 08 '18 at 02:55
  • That "," was preventing column 2 from being recognized as numeric. As soon as I removed it, the whole column changed to "integer" and I did not need to convert it to numeric. Thanks for all you help. You are AWESOME Phil. I upvoted your answer but apparently don't have enough posts to have it register – Gene100 Jan 08 '18 at 03:01
  • No worries, glad you were able to make it work. – Phil Jan 08 '18 at 03:08