3

I don't understand how the new native pipe placeholder works. Prior to R 4.2, the native pipe didn't have a placeholder so you needed to create a dedicated anonymous function in order to pass the piped object to function arguments other than the first. Now, after the release of R 4.2, the native pipe got a dedicated placeholder _ too. I'm also aware that this new placeholder only works if the name of the argument that will receive the placeholder is directly stated: R 4.2.0 Native Placeholder. However I'm still facing some trouble and can't fully understand how to implement it.

I'll give you an example. I wrote a simple piped code chunk that takes an object and returns how many missing values there are in each column.

x = c(NA, NA, 1, NA, 1, 2)
m = matrix(x, nrow = 3, ncol = 2)
m

#      [,1] [,2]
# [1,]   NA   NA
# [2,]   NA    1
# [3,]    1    2


#### CHECK FOR MISSING VALUES ####
m |> 
  { \(.) .colSums(is.na(.), NROW(.), NCOL(.)) }() |> 
  { \(sum.NA) rbind(names(m), sum.NA) }() |> 
  t()

#      sum.NA
# [1,]      2
# [2,]      1

The previous code uses the anonymous function method and works nicely. I'm not able to change this code into properly using the new placeholder. Do you have any suggestion?

Bezzus
  • 69
  • 8
  • 2
    I don't think the `_` placeholder will work here in the case of `.colSums`, because it can only be used once per function call (it cannot be passed to multiple arguments) – Allan Cameron Jun 02 '22 at 10:39
  • @AllanCameron that's unfortunate. I guess the magrittr pipe is still better than the new native placeholder. Hope it will be improved in the future. – Bezzus Jun 02 '22 at 10:49
  • 1
    It has been kept deliberately simple, and is just not as sophisticated as the magrittr pipe. There's a good summary of the differences [here](https://r4ds.hadley.nz/workflow-pipes.html#vs), which also gives some advice about when to use which pipe operator. – Allan Cameron Jun 02 '22 at 10:53
  • 1
    Note that the new pipe works at the parser level. So when you run `quote(m |> is.na())` you'll see that code is turned into to `is.na(m)`. There is no memory of the pipe in the parsed abstract syntax tree. (as compared to `quote(m %>% is.na())` where the `%>%` is actually a function.) Basically the new pipe is more like syntactic sugar. It rewrite code and thus can run faster because there's no code to run at all after the transformation takes place. – MrFlick Jun 02 '22 at 13:08

2 Answers2

3

The placeholder was introduced in R 4.2.0. From R News, CHANGES IN R 4.2.0, section NEW FEATURES, my emphasis.

  • In a forward pipe |> expression it is now possible to use a named argument with the placeholder _ in the rhs call to specify where the lhs is to be inserted. The placeholder can only appear once on the rhs.

You can use the placeholder once only in a rhs named argument.

Though the blog post linked to in the question mentions the named argument obligation and gives right and wrong ways of using the placeholder, it does not mention that it can be used only once.


In the question's case, there is no need to use the new placeholder _.

x = c(NA, NA, 1, NA, 1, 2)
m = matrix(x, nrow = 3, ncol = 2)
m
#>      [,1] [,2]
#> [1,]   NA   NA
#> [2,]   NA    1
#> [3,]    1    2

m |>
  is.na() |>
  colSums() |>
  matrix(dimnames = list(NULL, 'sum.NA'))
#>      sum.NA
#> [1,]      2
#> [2,]      1

Created on 2022-06-02 by the reprex package (v2.0.1)


Another way, one function per step, this time using the placeholder.
(I only remembered to use cbind after reading Gabor's answer.)

m |>
  is.na() |>
  colSums() |>
  cbind(sum.NA = _)
#>      sum.NA
#> [1,]      2
#> [2,]      1

Created on 2022-06-02 by the reprex package (v2.0.1)

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • I thank you for your answer. My code may not be optimal, I'm not an R expert, and your code is certainly better than mine. However my code was just an example. I'm not asking how to change that chunk into using a better syntax. I'm trying to understand how the new placeholder works. That's why I asked how to convert my code into using the new placeholder, and not just how to better write that chunk. – Bezzus Jun 02 '22 at 10:46
  • @Bezzus Added code and an explanation. Hope it makes it more clear. – Rui Barradas Jun 02 '22 at 13:08
  • Yes. Now it's more clear. Thank you! – Bezzus Jun 02 '22 at 17:27
3

You will need to restructure this a bit to take advantage of _ . _ does not directly address the problem of using the LHS multiple times on the RHS and does not address the problem of nesting functions on the RHS, both of which are problems that the code faces. Also note that the code in the question reuses m again within the code which really defeats the left to right idea of pipes. Also names(m) is NULL since m has no names.

We create a list with a single element named x and then use that in the next line to solve the problem of having to refer to it 3 times and also to address the nested calls. In the rbind we eliminated reference to m since rbinding NULL is pointless. We did manage to use _ twice and eliminate all the anonymous functions while keeping mostly to the idea of the code in the question.

m |>
  list(x = _) |>
  with(.colSums(is.na(x), NROW(x), NCOL(x))) |>
  rbind(sum.NA = _) |>
  t()
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thank you for your answer. You're right when you say reusing `m` defeats the purpose of piping. Thanks to your answer and that of Rui Barradas I managed to rewrite the code with better syntax and I also better know how the new placeholder works. Regarding the `names()` function I'm aware the matrix `m` has no colnames, but in my original code, this piped chunk of code was meant to be used on a regular dataframe, with proper colnames. `m` is just an example object. – Bezzus Jun 02 '22 at 17:22
  • 1
    rbinding the names to the data still has the problem that it coerces the data to character if the names are not NULL. – G. Grothendieck Jun 02 '22 at 18:03
  • Sure but that's not really a problem. It's just a fictional case study data cleaning script. I just needed to show how many missing values there were and print the results on a markdown document. Character or numeric datatype, it made no difference, and even if it did, converting to numeric is really easy. – Bezzus Jun 02 '22 at 18:41