1

I want to code a ggplot2 visualization as a function, and then apply the function on each row of a dataframe (I want to use apply to avoid a for loop, as suggested here.)

The data:

library(ggplot2)
point1 <- c(1,2)
point2 <- c(2,2)

points <-as.data.frame(rbind(point1,point2))

I saved points as a data frame and it runs fine in ggplot2:


ggplot(data = points) +
    geom_point(aes(x = points[, 1], y = points[, 2])) +
    xlim(-3, 3) +
    ylim(-3, 3) +
    theme_bw()

That's not really the plot I want though: I would like two plots, each one with one point.

Now I build a function that will loop through the rows of the data frame:


plot_data <- function(data) {
  ggplot(data) +
    geom_point(aes(x = data[, 1], y = data[, 2])) +
    xlim(-3, 3) +
    ylim(-3, 3) +
    theme_bw()
}

I create a list to store the plots:

myplots <- list()

And here is the call to apply, following this suggestion:

myplots <- apply(points, 1, plot_data)

But I get the following error:

#> Error: `data` must be a data frame, or other object coercible by `fortify()`, 
not a numeric vector

But my data are a data frame.

Is this because: "apply() will try to convert the data.frame into a matrix (see the help docs). If it does not gracefully convert due to mixed data types, I'm not quite sure what would result" as noted in a comment to the answer I referred to?

Still, if I check the data class after the call to apply, the data are still a dataframe:

class(points)
#> [1] "data.frame"

Created on 2021-04-09 by the reprex package (v0.3.0)

Emy
  • 817
  • 1
  • 8
  • 25
  • 1
    `apply` is a poor choice here. `apply` is built to work on matrices, and when you give `apply` a data frame, the very first thing it does is convert it to a matrix--as you say. This conversion is within the `apply` function. Your original data is still a data frame and still unchanged (unless you assign something to it, `points <- apply(...)`. So, if you *really* want to use `apply`, you can convert the vector row of the matrix that `apply` works with back to a data frame, but this is quite inefficient. – Gregor Thomas Apr 09 '21 at 15:57
  • Thank you for the explanation. What would you recommend, as an alternative to `apply`? – Emy Apr 09 '21 at 15:59
  • 2
    If you're allergic to for loop and open to other `*apply` family functions, you could use `lapply` across the row indexes, something like `myplots <- lapply(1:nrow(points), function(i) plot_data(points[i, ]))`. Or a for loop, `for(i in 1:nrow(points)) {myplots[[i]] <- plot_data(points[i, ])}`. – Gregor Thomas Apr 09 '21 at 15:59
  • Thanks a lot, it worked. I tried `lapply` before, but what confused me is that reading the syntax `lapply(X, FUN, …)`, I thought (1) that `X` should be the data (meaning the whole data.frame) and (2) that I would just need to call `plot_data` in place of `FUN`. Where I can learn more about the construction you used? – Emy Apr 09 '21 at 16:13
  • 1
    I don't really know a good source for info on that construction... it's more just familiarity with `*apply` family functions and R data structures. I really like [this FAQ](https://stackoverflow.com/q/3505701/903061) for understanding the *apply family functions generally. And of course the R for Data Science [chapter on iteration](https://r4ds.had.co.nz/iteration.html) is good, and includes an introduction to `purrr`'s variants of these functions. – Gregor Thomas Apr 09 '21 at 17:34
  • 1
    Whatever `X` is is what `lapply` will iterate over, and because data frames are built as lists of columns, if you `lapply(some_data_frame, foo)` the iteration will be over columns. E.g., `lapply(mtcars,mean)`, `lapply(iris, class)`. – Gregor Thomas Apr 09 '21 at 17:34

1 Answers1

1

As suggested by Gregor Thomas in the comment:

library(ggplot2)
point1 <- c(1, 2)
point2 <- c(2, 2)

points <- as.data.frame(rbind(point1, point2))

plot_data <- function(data) {
  ggplot(data) +
    geom_point(aes(x = data[, 1], y = data[, 2])) +
    xlim(-3, 3) +
    ylim(-3, 3) +
    theme_bw()
}
myplots <- list()
myplots <- lapply(1:nrow(points), function(i) plot_data(points[i, ]))
myplots
#> [[1]]

#> 
#> [[2]]

Created on 2021-04-09 by the reprex package (v0.3.0)

Emy
  • 817
  • 1
  • 8
  • 25
  • you don't need `myplots <- list()` when using lapply. If you have data that is separated by row, why don't you arrange it in a list in a first place? You can then easier loop over the list with all the apply functions and purrr and many more. I'd create a list by splitting your data frame by rows and then iterate over each data frame. – tjebo Apr 09 '21 at 21:10