Reference current data.frame in dplyr flow

Question

How can I reference the current data.frame in a dpylr flow? As an example, in

library(dplyr)

myresults = tribble(
  ~dataset_name, ~method_group, ~method, ~value,
  'iris',        'other',       'a',     1,
  'wine',        'other',       'b',     2,
  'iris',        'mine',        'c',     3,
  'wine',        'mine',        'd',     4
)

myresults %>%
  mutate(dataset_name='datasets aggregated') %>%
  bind_rows(XXX %>% filter(method=='c') %>% mutate(method_group = 'other'))

I would like to row-bind the current data.frame with itself. What do I write instead of the XXX ?

In the function do(), the answer seems to be .. Even though this is not very elegant and I would prefer not to have to use do, I managed to get the desired result with

myresults %>%
  mutate(dataset_name='datasets aggregated') %>%
  do(bind_rows(data.frame(.), data.frame(.) %>% filter(method=='c') %>% mutate(method_group = 'other')))

but this is not very nice.

My R version is:

> R.version
               _                           
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          3                           
minor          4.4                         
year           2018                        
month          03                          
day            15                          
svn rev        74408                       
language       R                           
version.string R version 3.4.4 (2018-03-15)
nickname       Someone to Lean On

Instead of chaining everything together, why don't you try two separate computations and then bind the results? A short example with desired output will help. — aichao, Sep 22 '18 at 14:46
@aichao: Because the second line with "mutate(dataset_name..." would have to be repeated. Image a more complex scenario. Now, we could alternatively build intermediate data.frames as another alternative, but this will clog the namespace and thus my little window in RStudio which displays all my data.frames. Of course the code above is only a small part of a large project, so things add up. — Make42, Sep 22 '18 at 14:55
@Make42 In general, it is good to provide a small input data set (this one can be very small) and an expected output data set, that should ensure that everyone is on the same page. — steveb, Sep 22 '18 at 15:10

score 6 · Accepted Answer · answered Sep 22 '18 at 16:27

Three options that I see:

Move the . to within filter, since it appears to know what to do:

myresults %>%
  mutate(dataset_name='datasets aggregated') %>%
  bind_rows(filter(., method=='c') %>% mutate(method_group = 'other'))
# # A tibble: 5 x 4
#   dataset_name        method_group method value
#   <chr>               <chr>        <chr>  <dbl>
# 1 datasets aggregated other        a          1
# 2 datasets aggregated other        b          2
# 3 datasets aggregated mine         c          3
# 4 datasets aggregated mine         d          4
# 5 datasets aggregated other        c          3

Use a temporary variable, mid-pipe:

z <- myresults %>% mutate(dataset_name='datasets aggregated')
bind_rows(z, z %>% filter(method=='c') %>% mutate(method_group = 'other'))
# # A tibble: 5 x 4
#   dataset_name        method_group method value
#   <chr>               <chr>        <chr>  <dbl>
# 1 datasets aggregated other        a          1
# 2 datasets aggregated other        b          2
# 3 datasets aggregated mine         c          3
# 4 datasets aggregated mine         d          4
# 5 datasets aggregated other        c          3

Similar to your do implementation. (You don't need data.frame(.), that's a little redundant, but do apparently does not replace instances of the . within a nested pipe.)

myresults %>%
  mutate(dataset_name='datasets aggregated') %>%
  do({dat <- .; bind_rows(dat, dat %>% filter(method=='c') %>% mutate(method_group = 'other'))})
# # A tibble: 5 x 4
#   dataset_name        method_group method value
#   <chr>               <chr>        <chr>  <dbl>
# 1 datasets aggregated other        a          1
# 2 datasets aggregated other        b          2
# 3 datasets aggregated mine         c          3
# 4 datasets aggregated mine         d          4
# 5 datasets aggregated other        c          3

I like 1, since it is short and does not produce further data.frames. Regarding 3: If I do not use `data.frame(.)` I get an error that a data.frame is expected not a sequence as the second argument (or something like that). — Make42, Sep 22 '18 at 16:49
Odd that you get an error and I do not. Are you getting that error with the sample data from the question or just from your actual data? — r2evans, Sep 22 '18 at 17:17
Also with the sample data that I posted here: "Argument 2 must be a data frame or a named atomic vector, not a fseq/function" — Make42, Sep 22 '18 at 17:53
I'm using `dplyr-0.7.6` on R-3.5.1-x86_64-w64-mingw32, perhaps there's a relevant version difference in either of those. — r2evans, Sep 22 '18 at 17:55
(`R.version` gives you the R version.) There are [many changes to `dplyr` since 0.7.4](https://github.com/tidyverse/dplyr/blob/master/NEWS.md), and I don't know if or which of them might be the culprit. — r2evans, Sep 22 '18 at 18:00
I added the version to the question post. Me neither, but tonight is not the night for finding out ;-). — Make42, Sep 22 '18 at 18:06

Reference current data.frame in dplyr flow

1 Answers1

Linked