tidyr::pop_quiz: is there a faster/ more transparent way to reshape the anscombe dataset?

Question

I'm trying to get good with tidyr. Is there a better way to prep the anscombe dataset for plotting with ggplot2? Specifically, I don't love having to add data (obs_num). How would you do this?

library(tidyverse)
library(datasets)

anscombe %>%
  mutate(obs_num = 1:n()) %>%
  gather(variable, value, -obs_num) %>%
  separate(variable, c("variable", "set"), 1) %>%
  spread(variable, value) %>%
  ggplot(aes(x = x, y = y)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
  facet_wrap(~set)

I wonder how many people checked to see if `pop_quiz` is really a function in *tidyr*. I did. — Rich Scriven, Oct 19 '16 at 17:24

score 2 · Accepted Answer · edited May 23 '17 at 12:00

2

I think you need to add the extra column in order to uniquely identify each observation in the call to spread. Hadley discusses this in a comment on this SO question. Another approach would be to separately stack the x and y columns, as in the code below, but I don't see why that would be any better than your version. In fact, it could be worse if there are cases where the x and y values end up out of correspondence:

bind_cols(anscombe %>% select(matches("x")) %>% gather(set, "x"),
          anscombe %>% select(matches("y")) %>% gather(key, "y")) %>%
  select(-key) %>%
  mutate(set = gsub("x", "Set: ", set))

Another option would be to use base reshape, which is more succinct:

anscombe %>% 
  reshape(varying=1:8, direction="long", sep="", timevar="set")

edited May 23 '17 at 12:00

Community

1
1

answered Oct 19 '16 at 17:21

eipi10

91,525
24
209
285

1

`reshape` is mysterious and powerful! fantastic one line solution, and I'm not convinced that the tidyverse solution is any less opaque in this case. – Alex Coppock Oct 19 '16 at 17:50
1

Yes, I find base `reshape` mysterious as well. It would be nice if `tidyr` could similarly deal with multiple pairs of corresponding columns. – eipi10 Oct 19 '16 at 17:52

tidyr::pop_quiz: is there a faster/ more transparent way to reshape the anscombe dataset?

1 Answers1