0

I have the following R code which contains some dummy data. I am trying to create a bubble chart where the size of the bubble is dependent on the amount and is positioned based on the profitability (as a % of the amount) on the x-axis and the volatility (as a % of the amount) on the y-axis. Code is as follows:

 rio_csv <- import("~/Desktop/R/Dummy Data.csv") 

# Select columns to go into df

df <- data.frame("Volpc" = rio_csv[,6],"Profitpc"= rio_csv[,5],"Amount"= rio_csv[,4])

#Plot Bubble Chart

plot <- ggplot(df, aes(x = Profitpc, y = Volpc, size = Amount)) + 
geom_point(alpha = 0.2) + scale_size(range = c(5,15)) + xlab("Profitability %") + 
ylab("Volatility %")

plot

The profitability measure on the x-axis is a percentage and the volatility on the y-axis is a percentage. They both have the data type 'character'.

My first problem is when i run the code a bubble chart appears but the x-axis is not in numerical order, the y-axis is in numerical order.

I tried to use the code df$Profitpc <- as.numeric(df$Profitpc) but this causes all the values in the column to go NA with the warning 'NAs introduced by coercion'.

Is there a way of ordering the x-axis so it is in numerical order (increasing)?

My second problem is that the scaling of both axes are not suitably scaled. Ideally i would like a situation where both axes have appropriate scaling such as 0 to the max % value. Is there a way to do this also? I am sorry if this is obvious. I have attached a picture of the chart to illustrate the issues.enter image description here

Phil
  • 7,287
  • 3
  • 36
  • 66
Dinks123
  • 145
  • 1
  • 14

2 Answers2

1

You've given us your code but not your data, so this isn't a simple self-contained example or a reprex. [See this post for more advice on how to give us what we need to help you.]

However, from the symptoms you describe, I'm guessing that df$Profitpc contains values such as 27.0%. That's why as.numeric() fails: it doesn't know how to handle the %. So your solution is to reformat your input data so that df$Profitpc truly is a numeric. Then the graph will behave as you want. As you haven't given us your input data, you're on your own when it comes to doing that...

Personally, I'd make the same change to df$Volpc as well. As you've discovered, it's only luck that has presented the data in the order you want it. Once you've got numeric data (and as a result, the order of display that you want), you can use features of ggplot to format its appearance the way you want.

The lesson here is that it is important to separate the derivation of your data from its presentation.

Limey
  • 10,234
  • 2
  • 12
  • 32
  • Yes you are correct, the values were in a percentage format which it did not like. Converting both Volpc and Profitpc into a different format solves both issues. Thank you i will make sure to provide a head(df) for future questions. – Dinks123 May 30 '20 at 18:19
0

I second @Limey. Still what you could try is check whether Profitpc is a factor and if yes convert it to character like this:

ggplot(df, aes(x = as.character(Profitpc), y = (Volpc), size = Amount)) + 
  geom_point(alpha = 0.2) + scale_size(range = c(5,15)) + xlab("Profitability %") + 
  ylab("Volatility %") 

Still does not guarantee that the order will be right, therefore I would also convert the variables to numeric variables. You could use parse_number() from the readr package like this:

ggplot(df, aes(x = parse_number(Profitpc), y = parse_number(Volpc), size = Amount)) + 
  geom_point(alpha = 0.2) + scale_size(range = c(5,15)) + xlab("Profitability %") + 
  ylab("Volatility %") 

Data

df <- tibble::tribble(
        ~Profitpc,   ~Volpc, ~Amount,
            "10%",    "30%",     10L,
         "15.50%",    "20%",     15L,
            "81.40%", "80.30%",      6L,
         "50%",  "30.3&",     12L
        )
Ahorn
  • 3,686
  • 1
  • 10
  • 17