4

For anything related to processing data in R, I've recently been seeing tidyverse recommended as almost essential. This raises a question - if it is all that it's hyped up to be, is there any reason not to use it? For example, are the frameworks in tidyverse restrictive in any way that is worthy of mention?

J. Mini
  • 1,868
  • 1
  • 9
  • 38
  • @IanCampbell I believe that's only one small part of tidyverse. – J. Mini Apr 23 '20 at 18:08
  • 3
    See also: Norman Matloff's excellent essay, [Tidyverse Skeptic](https://github.com/matloff/TidyverseSkeptic). – Len Greski Apr 24 '20 at 00:10
  • @LenGreski That's excellent. Above all else, I'm very glad to have it confirmed that ggplot2 has nothing to do with stuff like tibbles. – J. Mini Apr 24 '20 at 14:11

1 Answers1

15

First drawback: stability

One drawback is that tidyverse functions change more rapid than, say, base R. So if you want stability over long time I would go for base R. That said, the tidyverse developers are open about their different approach. See e.g. the Welcome to the Tidyverse vignette:

the biggest difference [between base R and tidyverse] is in priorities: base R is highly focussed on stability, whereas the tidyverse will make breaking changes in the search for better interfaces.

...and Hadley's answer on to Do you expect the tidyverse to be the part of core R packages some day?

It’s extremely unlikely because the core packages are extremely conservative so that base R code is stable, and backward compatible. I prefer to have a more utopian approach where I can be quite aggressive about making backward incompatible changes while trying to figure out a better API.

Second drawback: flexibility

The tidy data concept is great but the Iimitation to have same row number after transformation as before (see mutate) is not always possible. See for example

library(tidyverse)
data.frame(matrix(rnorm(1000), ncol = 10)) %>%
  mutate_all(function(i) density(i)$x)

which gives an error because row number changes. Sometime I run into situations like that where mutate complains that row number is not the same. It is similiar with summarise that expects only length one per column which is not the case for range, for instance. There are workarounds, for sure, but I prefer base R that here would simply be

apply(data.frame(matrix(rnorm(1000), ncol = 10)),
      2,
      function(i) density(i)$x)

Third drawback: complexity

There are situations where the tidyverse works but is much more cumbersome. Some time ago I asked a question how to do this code

df[df$age > 90, ] <- NA

... within the tidyverse and the two answers suggested using

df %>% select(x, y, age) %>%
  mutate_all(~replace(.x, age> 90, NA))
# or
df %>%
  mutate_all(function(i) replace(i, .$age> 90, NA))

Both answers work but are obviously not as quick to code as with base R.

Forth drawback: Limitation

If you want to define your own function you do something like my_fun <- function(x) ..., where function itself is a base R function which to my knowledge has no tidyverse counterpart. There are many examples where there is not a tidyverse equivalent for a base R function and probably never will be, e.g. rnorm, eval, c, and so on. In fact, this is not that much a drawback of tidyverse but it shows that tidyverse and base are great for different things and this is why you should learn both.

Why this question should not be closed

The question was closed as a duplicate and linked to another about tidyverse vs. data.table. In my opinion, if someone asks about disadvantages of tidyverse (or any other package) this does not mean the person is asking for a comparison with the data.table package. Instead, it is more obvious to tell the disadvantages of tidyverse by comparing it with R base which is not done in the linked question, e.g. this question is not a duplicate.

bathyscapher
  • 1,615
  • 1
  • 13
  • 18
  • 1
    Good explanation. A lot of tidyverse functions are also way slower than their Base R counterparts. Another point, the replication of base R functions sort of splits the language in a way -- it is also one of the reasons I love R so much (that there are many ways to skin a cat) -- but the duplication makes learning R a matter of learning alot of functions. I'm sure you've seen this where questions are answered using Base, Tidyverse and data.table say. When I worked as a university tutor undergrad students sometimes had difficulty coping with the sheer amount of ways to do something. – hello_friend May 04 '20 at 12:55