2

I am trying to recreate the basic temperature trend of this Paleotemperature figure in R. (Original image and data.)

enter image description here

The scale interval of the x-axis changes from 100s of millions of years to 10s of millions to millions, and then to 100s of thousands, and so on, but the ticks marks are evenly spaced. The original figure was carefully laid out in five separate graphs in Excel to achieve the spacing. I am trying to get the same x-axis layout in R.

I have tried two basic approaches. The first approach was to use par(fig=c(x1,x2,y1,y2)) to make five separate graphs placed side by side. The problem is that the intervals among tick marks is not uniform and labels overlap.

#1
par(fig=c(0,0.2,0,0.5), mar=c(3,4,0,0))
plot(paleo1$T ~ paleo1$Years, col='red3', xlim=c(540,60), bty='l',type='l', ylim=c(-6,15), ylab='Temperature Anomaly (°C)')
abline(0,0,col='gray')

#2
par(fig=c(0.185,0.4,0,0.5), mar=c(3,0,0,0), new=TRUE)
plot(paleo2$T ~ paleo2$Year, col='forestgreen', axes=F, type='l', xlim=c(60,5), ylab='', ylim=c(-6,15))
axis(1, xlim=c(60,5))
abline(0,0,col='gray')

#etc.

enter image description here

The second approach (and my preferred approach, if possible) is to plot the data in a single graph. This causes non-uniform distance among tick marks because they follow their "natural" order. (Edit: example data added as well as link to full data set.).

years <- c(500,400,300,200,100,60,50,40,30,20,10,5,4,3,2,1)
temps <- c(13.66, 8.6, -2.16, 3.94, 8.44, 5.28, 12.98, 8.6, 5, 5.34, 3.66, 2.65, 0.78, 0.25, -1.51, -1.77)
test <- data.frame(years, temps)
names(test) <- c('Year','T')

# The full csv file can be used with this line instead of the above.
# test <- read.csv('https://www.dropbox.com/s/u0dfmlvzk0ztpkv/paleo_test.csv?dl=1')

plot(test$T ~ test$Year, type='l', xaxt='n', xlim=c(520,1), bty='l', ylim=c(-5,15), xlab="", ylab='Temperature Anomaly (°C)')
ticklabels = c(500,400,300,200,100,60,50,40,30,20,10,5,4,3,2,1)
axis(1, at=ticklabels)

enter image description here

Adding log='x' to plot comes closest but the intervals between ticks are still not even and the actual scale is, of course, not a log scale.

enter image description here

My examples only go down to 1 million years because I am trying to solve the problem first but my the goal is to match the original figure above. I am open to ggplot solutions although I am only fleetingly familiar with it.

Michael S Taylor
  • 425
  • 5
  • 16
  • Interesting question. Can you provide some sample data in the question? See: [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) – Axeman Jan 04 '17 at 14:22
  • 1
    5-10 data points per "time scale" / panel would be enough. Much more tempting to play with than a link to 8MB of data... – Henrik Jan 04 '17 at 14:24
  • 3
    I actually went to the trouble of downloading the data (that server is sloooow). Imagine my delight when I discovered that it is not what we consider [tidy data](http://vita.had.co.nz/papers/tidy-data.pdf) and would require a lot of clean-up and reorganization. Please share data in a usable format. – Roland Jan 04 '17 at 14:34
  • 1
    @Axeman I added sample data. – Michael S Taylor Jan 04 '17 at 14:45
  • @Roland I've just copied and pasted to get what I needed because the arrange in the spreadsheet was definitely not tidy. I've added some sample data. – Michael S Taylor Jan 04 '17 at 14:46
  • the gridExtra package might make this possible using ggplot https://cran.r-project.org/web/packages/gridExtra/vignettes/arrangeGrob.html – JohnSG Jan 04 '17 at 14:48
  • There are some facilities for discontinous scales in `plotrix`. Please note the note by the package author: "_There is some controversy about the propriety of using discontinuous coordinates for plotting, and thus axis breaks._ [...] _The major objection seems to be that the reader will be misled by assuming continuous coordinates_". Thus, (semi-)separated panels may be preferable. – Henrik Jan 04 '17 at 15:36
  • @Henrik I agree that confusion can be caused by discontinuous scales. However, this is the end part of an exercise for students. They will first plot the five panels separately (in Excel) and answer questions about trend, maximum rates of temperature change, etc. so students will become familiar with the scales. After that, they will excute the R script to see the five panels together in context. That should reduce confusion. Still, I will think about small gaps. – Michael S Taylor Jan 04 '17 at 15:50

3 Answers3

4

I will strike a different note by saying: don't. In my experience, the harder something is to do in ggplot2 (and to a lesser extent, base graphics), the less likely it is to be a good idea. Here, the problem is that consistently changing the scales like is more likely to lead the viewer astray.

Instead, I recommend using a log scale and manually setting your cutoffs.

First, here is some longer data, just to cover the full likely scale of your question:

longerTest <-
  data.frame(
    Year = rep(1:9, times = 6) * rep(10^(3:8), each = 9)
    , T = rnorm(6*9))

Then, I picked some cutoffs to place the labels at in the plot. These can be adjusted to whatever you want, but are at least a starting point for reasonably spaced ticks:

forLabels <-
  rep(c(1,2,5), times = 6) * rep(10^(3:8), each = 3)

Then, I manually set some things to append to the labels. Thus, instead of having to say "Thousands of years" under part of the axis, you can just label those with a "k". Each order of magnitude gets a value. Nnote that the names are just to help keep things straight: below I just use the index to match. So, if you skip the first two, you will need to adjust the indexing below.

toAppend <-
  c("1" = "0"
    , "2" = "00"
    , "3" = "k"
    , "4" = "0k"
    , "5" = "00k"
    , "6" = "M"
    , "7" = "0M"
    , "8" = "00M")

Then, I change my forLabels into the text versions I want to use by grabbing the first digit, and concatenating with the correct suffix from above.

myLabels <-
  paste0(
    substr(as.character(forLabels), 1, 1)
    , toAppend[floor(log10(forLabels))]
  )

This gives:

 [1] "1k"   "2k"   "5k"   "10k"  "20k"  "50k"  "100k" "200k" "500k" "1M"   "2M"  
[12] "5M"   "10M"  "20M"  "50M"  "100M" "200M" "500M"

You could likely use these for base graphics, but getting the log scale to do what you want is sometimes tricky. Instead, since you said you are open to a ggplot2 solution, I grabbed this modified log scale from this answer to get a log scale that runs from big to small:

library("scales")
reverselog_trans <- function(base = exp(1)) {
  trans <- function(x) -log(x, base)
  inv <- function(x) base^(-x)
  trans_new(paste0("reverselog-", format(base)), trans, inv, 
            log_breaks(base = base), 
            domain = c(1e-100, Inf))
}

Then, just pass in the data, and set the scale with the desired breaks:

ggplot(longerTest
       , aes(x = Year
             , y = T)) +
  geom_line() +
  scale_x_continuous(
    breaks = forLabels
    , labels = myLabels
    , trans=reverselog_trans(10)
  )

Gives:

enter image description here

Which has a consistent scale, but is labelled far more uniformly.

If you want colors, you can do that using cut:

ggplot(longerTest
       , aes(x = Year
             , y = T
             , col = cut(log10(Year)
                         , breaks = c(3,6,9)
                         , labels = c("Thousands", "Millions")
                         , include.lowest = TRUE)
             , group = 1
             )) +
  geom_line() +
  scale_x_continuous(
    breaks = forLabels
    , labels = myLabels
    , trans=reverselog_trans(10)
  ) +
  scale_color_brewer(palette = "Set1"
                     , name = "How long ago?")

enter image description here

Here is a version using facet_wrap to create different scales. I used 6 here, but you can set whatever thresholds you want instead.

longerTest$Period <-
  cut(log10(longerTest$Year)
      , breaks = c(3, 4, 5, 6, 7, 8, 9)
      , labels = paste(rep(c("", "Ten", "Hundred"), times = 2)
                       , rep(c("Thousands", "Millions"), each = 3) )
      , include.lowest = TRUE)

longerTest$Period <-
  factor(longerTest$Period
        , levels =  rev(levels(longerTest$Period)))


newBreaks <-
  rep(c(2,4,6,8, 10), times = 6) * rep(10^(3:8), each = 5)

newLabels <-
  paste0(
    substr(as.character(newBreaks), 1, 1)
    , toAppend[floor(log10(newBreaks))]
  )

ggplot(longerTest
       , aes(x = Year
             , y = T
       )) +
  geom_line() +
  facet_wrap(~Period, scales = "free_x") +
  scale_x_reverse(
    breaks = newBreaks
    , labels = newLabels
  )

gives:

enter image description here

Community
  • 1
  • 1
Mark Peterson
  • 9,370
  • 2
  • 25
  • 48
  • Thanks. Please see my comment to @Henrik above for the rational underlying my figure. Students will have context for the scale. – Michael S Taylor Jan 04 '17 at 15:57
  • 1
    In this case, I would argue that it is teaching them (arguably) bad data visualization practices then. Using a log scale like this may be a good way of transitioning to show them how to look over a large data range (log scale) and the limitations of Excel for complex visualization. If you absolutely must do it with separate scales, I think that going with a `facet_wrap` approach (perhaps combined with setting breaks/labels manually) is the right way to go to keep the plots a little separated from each other. – Mark Peterson Jan 04 '17 at 16:06
  • I agree completely with your argument re visualization but this is not a visualization exercise. It is a climate change exercise. The purpose is to show that global temperatures have had large fluctuations over time. In another part of the exercise they will learn that the current *rate* of change is faster than rates estimated for the past. – Michael S Taylor Jan 04 '17 at 16:21
  • 1
    Every time you visualize data, it is a data visualization exercise. Without knowing the context of the course, I will still guess that many of the students may not take much else in the way of stats. Exercises like this one will be the bulk of how they learn to visualize data, and I would be very wary to show them *bad* examples. If you want to replicate the separate panels, I've added a version to do that now with `facet_wrap`. – Mark Peterson Jan 04 '17 at 16:32
  • I've decided to go with your suggested log scale route. Is it possible to color the line with different colors above and below the zero line (e.g., red above, blue below)? – Michael S Taylor Jan 04 '17 at 22:08
  • Yes, take a look at the syntax in the color version. You should be able to use `ifelse(T>0)` to get the distinction you want. Lots of similar questions on SO with more details to modify from there. – Mark Peterson Jan 04 '17 at 22:17
2

Here is a start:

#define the panels
breaks <- c(-Inf, 8, 80, Inf)
test$panel <- cut(test$Year, breaks, labels = FALSE)
test$panel <- ordered(test$panel, levels = unique(test$panel))

#for correct scales
dummydat <- data.frame(Year = c(0, 8, 8, 80, 80, max(test$Year)),
                       T = mean(test$T),
                       panel = ordered(rep(1:3, each = 2), levels = levels(test$panel)))

library(ggplot2)
ggplot(test, aes(x = Year, y = T, color = panel)) +
  geom_line() +
  geom_blank(data = dummydat) + #for correct scales
  facet_wrap(~ panel, nrow = 1, scales = "free_x") +
  theme_minimal() + #choose a theme you like
  theme(legend.position = "none", #and customize it
        panel.spacing.x = unit(0, "cm"),
        strip.text = element_blank() , 
        strip.background = element_blank()) +
  scale_x_reverse(expand = c(0, 0))

resulting plot

Roland
  • 127,288
  • 10
  • 191
  • 288
1

Here's a basic example of doing it with separate plots using gridExtra. This may be useful to combine with extra grobs, for instance to create the epoch boxes across the top (not done here). If so desired, this might be best combined with Roland's solution.

# ggplot with gridExtra
library('ggplot2')
library('gridExtra')
library('grid')

d1 <- test[1:5, ]
d2 <- test[6:11, ]
d3 <- test[12:16, ]

plot1 <- ggplot(d1, aes(y = T, x = seq(1:nrow(d1)))) +
  geom_line() +
  ylim(c(-5, 15)) +
  theme_minimal() +
  theme(axis.title.x = element_blank(),
        plot.margin = unit(c(1,0,1,1), "cm")) +
  scale_x_continuous(breaks=)

plot2 <- ggplot(d2, aes(y = T, x = seq(1:nrow(d2)))) +
  geom_line() +
  ylim(c(-5, 15)) +
  theme_minimal() +
  theme(axis.text.y = element_blank(),
        axis.title.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.x = element_blank(),
        plot.margin = unit(c(1,0,1,0), "cm"))

plot3 <- ggplot(d3, aes(y = T, x = seq(1:nrow(d3)))) +
  geom_line() +
  theme_minimal() +
  theme(axis.text.y = element_blank(),
        axis.title.y = element_blank(),
        axis.ticks.y = element_blank(), 
        axis.title.x = element_blank(),
        plot.margin = unit(c(1,0,1,0), "cm")) +
  ylim(c(-5, 15))

# put together
grid.arrange(plot1, plot2, plot3, nrow = 1,
             widths = c(1.5,1,1)) # allow extra width for first plot which has y axis

enter image description here

JohnSG
  • 1,567
  • 14
  • 26