Logarithmic scaling with ggplot2 in R

Question

I am trying to create a diagram using ggplot2. There are several very small values to be displayed and a few larger ones. I'd like to display all of them in an appropriate way using logarithmic scaling. This is what I do:

plotPointsPre <- ggplot(data = solverEntries, aes(x = val, y = instance, 
                                                  color = solver, group = solver))

...

finalPlot <- plotPointsPre + coord_trans(x = 'log10') + geom_point() +
xlab("costs") + ylab("instance")

This is the result:

This

It is just the same as without coord_trans(x = 'log10').

However, if I use it with the y-axis:

it works

How do I achieve the logarithmic scaling on the x-axis? Besides, it is not about the x-axis, if I switch the values of x and y, then it works on the x-axis and no longer on the y-axis. So there seems to be some problem with the displayed values. Does anybody have an idea how to fix this?

Edit - Here's the used data contained in solverEntries:

solverEntries <- data.frame(instance = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20),
                 solver = c(4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1),
                 time = c(1, 24, 13, 6, 1, 41, 15, 5, 1, 26, 16, 5, 1, 39, 7, 4, 1, 28, 11, 3, 1, 31, 12, 3, 1, 38, 20, 3, 1, 37, 10, 4, 1, 25, 11, 3, 1, 32, 18, 4, 1, 27, 21, 3, 1, 23, 22, 3, 1, 30, 17, 2, 1, 36, 8, 3, 1, 37, 19, 4, 1, 40, 21, 3, 1, 29, 11, 4, 1, 33, 10, 3, 1, 34, 9, 3, 1, 35, 14, 3),
                 val = c(6553.48, 6565.6, 6565.6, 6577.72, 6568.04, 7117.14, 6578.98, 6609.28, 6559.54, 6561.98, 6561.98, 6592.28, 6547.42, 7537.64, 6549.86, 6555.92, 6546.24, 6557.18, 6557.18, 6589.92, 6586.22, 6588.66, 6588.66, 6631.08, 6547.42, 7172.86, 6569.3, 6582.6, 6547.42, 6583.78, 6547.42, 6575.28, 6555.92, 6565.68, 6565.68, 6575.36, 6551.04, 6551.04, 6551.04, 6563.16, 6549.86, 6549.86, 6549.86, 6555.92, 6544.98, 6549.86, 6549.86, 6561.98, 6558.36, 6563.24, 6563.24, 6578.98, 6566.86, 7080.78, 6570.48, 6572.92, 6565.6, 7073.46, 6580.16, 6612.9, 6557.18, 7351.04, 6562.06, 6593.54, 6547.42, 6552.3, 6552.3, 6558.36, 6553.48, 6576.54, 6576.54, 6612.9, 6555.92, 6560.8, 6560.8, 6570.48, 6566.86, 6617.78, 6572.92, 6578.98))

Please provide some or all of the data `solverEntries` to [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — neilfws, May 08 '19 at 23:42
It's not correct that "There are several very small values to be displayed and a few larger ones." In the data you have many close to 6500 and a few in the 7000's. I don't think a log adjustment is helpful here, since the top values are only about 10% higher than the low ones. — Jon Spring, May 09 '19 at 14:56
Addendum: added answer that uses a log transform after normalizing the data. By stretching the scale, the underlying values become more distinguishable, at the cost of making proportions harder to perceive. — Jon Spring, May 09 '19 at 15:29

Jon Spring · Accepted Answer · 2019-05-09T15:58:46.803

Your data in current form is not log distributed -- most val around 6500 and some 10% higher. If you want to stretch the data, you could use a custom transformation using the scales::trans_new(), or here's a simpler version that just subtracts a baseline value to make a log transform useful. After subtracting 6500, the small values will be mapped to around 50, with the large values around 1000, which is a more appropriate range for a log scale. Then we apply the same transformation to the breaks so that the labels will appear in the right spots. (i.e. the label 6550 is mapped to the data that is mapped to 6550 - 6500 = 50)

This method helps if you want to make the underlying values more distinguishable, but at the cost of distorting the underlying proportions between values. You might be able to help with this by picking useful breaks and labeling them with scaling stats, e.g.

7000 +7% over min

my_breaks <- c(6550, 6600, 6750, 7000, 7500)
baseline = 6500

library(ggplot2)
ggplot(data = solverEntries, 
       aes(x = val - baseline, y = instance, 
           color = solver, group = solver)) +
  geom_point() +
  scale_x_log10(breaks = my_breaks - baseline,
                labels = my_breaks, name = "val")

Jon Spring is correct. You need to "zoom-in" rather than `log10` the axis. You can see for yourself if you replace a few values in your `solverEntries` dataframe with low values and then `log10` the axis. — TheSciGuy, May 09 '19 at 15:12

score 0 · Answer 2 · answered May 08 '19 at 23:55

0

Is this what you're looking for?

x_data <- seq(from=1,to=50)
y_data <- 2*x_data+rnorm(n=50,mean=0,sd=5)

#non log y
ggplot()+
  aes(x=x_data,y=y_data)+
  geom_point()

#log y scale
ggplot()+
  aes(x=x_data,y=y_data)+
  geom_point()+
  scale_y_log10()

#log x scale
ggplot()+
  aes(x=x_data,y=y_data)+
  geom_point()+
  scale_x_log10()

answered May 08 '19 at 23:55

Zeus

1,496
2
24
53

I'm not sure how to use your example with my x and y data, can you adjust your example using my data (solverEntries)? – balderdash May 09 '19 at 09:23

Logarithmic scaling with ggplot2 in R

2 Answers2