15

I'd like to have a function that can use pipe operator as exported from dplyr. I am not using magrittr.

df %>% my_function

How can I get df name? If I try

my_function <- function(tbl){print(deparse(substitute(tbl)))}

it returns

[1] "."

while I'd like to have [1] "df"

Any suggestion?

Thank you in advance,
Nicola

  • 4
    While I think this is an interesting question, why do you want to do this? – eddi May 05 '15 at 19:11
  • 1
    @eddi: my_func take the output of some dplyr manipulation and perform some extra operations to format data for print and export. I'd like to export data in the object in a csv file with the same name as the object.. – Nicola Sturaro Sommacal May 06 '15 at 07:31
  • Well, there is no way that `df %>% stuff %>% my_func` could do that, since by the point it gets to `my_func` it could very well be a different object entirely, and information about leftmost object will be lost, so my suggestion is to simply do `my_func(df)` – eddi May 06 '15 at 12:33
  • 6
    This answer seems to work: https://stackoverflow.com/a/42561430/5028841 or am I missing something? – JBGruber Oct 23 '19 at 22:11

5 Answers5

5

The SO answer that JBGruber links to in the comments mostly solves the problem. It works by moving upwards through execution environments until a certain variable is found, then returns the lhs from that environment. The only thing missing is the requirement that the function outputs both the name of the original data frame and the manipulated data – I gleaned the latter requirement from one of the OP's comments. For that we just need to output a list containing these things, which we can do by modifying MrFlick's answer:

get_orig_name <- function(df){
    i <- 1
    while(!("chain_parts" %in% ls(envir=parent.frame(i))) && i < sys.nframe()) {
        i <- i+1
    }
    list(name = deparse(parent.frame(i)$lhs), output = df)
}

Now we can run get_orig_name to the end of any pipeline to the get the manipulated data and the original data frame's name in a list. We access both using $:

mtcars %>% summarize_all(mean) %>% get_orig_name

#### OUTPUT ####

$name
[1] "mtcars"

$output
       mpg    cyl     disp       hp     drat      wt     qsec     vs      am   gear   carb
1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125

I should also mention that, although I think the details of this strategy are interesting, I also think it is needlessly complicated. It sounds like the OP's goal is to manipulate the data and then write it to a file with the same name as the original, unmanipulated, data frame, which can easily be done using more straightforward methods. For example, if we are dealing with multiple data frames we can just do something like the following:

df_list <- list(mtcars = mtcars, iris = iris)

for(name in names(df_list)){
    df_list[[name]] %>% 
        group_by_if(is.factor) %>%
        summarise_all(mean) %>% 
        write.csv(paste0(name, ".csv"))
}
  • 2
    I thought about posting this but decided against it as the question is a couple of years old and there is not much you can enhance in the linked function. Two things I would suggest are to scrap this part `&& i < sys.nframe()` as it does not seem to ever turn FALSE and use `message(deparse(parent.frame(i)$lhs))` to print the name before `return(df)`. I wouldn't return a list as further piping would be more complicated. – JBGruber Oct 24 '19 at 13:55
  • @JBGruber, yeah, I was almost done answering before I noticed it was from 2015. I was kind of annoyed but I had already committed to it, so I decided to post anyway. Not sure how the bounties work when they're from someone other than the OP. I expect `get_orig_name` would only be used at the end of a pipe, so further piping shouldn't be an issue. It is also easy enough to do something like `… %>% get_orig_name %>% \`$\`(output) %>% …` if you want to continue working with the data after getting the data frame name. I think I'll leave things as they are until this post gets some more attention. –  Oct 24 '19 at 14:11
  • 1
    At least I learned something while working on it. +1 for the explanation of how the function works, which was not in the original answer. – JBGruber Oct 24 '19 at 14:31
  • My understanding is that this no longer works. I've added a [new answer](https://stackoverflow.com/a/72311400/5009249) to the linked question, fully stolen from [Artem Sokolov's answer](https://stackoverflow.com/a/52080518/5009249) elsewhere. – Unrelated May 19 '22 at 21:49
2

Here's a hacky way of doing it, which I'm sure breaks in a ton of edge cases:

library(data.table) # for the address function
                    # or parse .Internal(inspect if you feel masochistic

fn = function(tbl) {
  objs = ls(parent.env(environment()))
  objs[sapply(objs,
          function(x) address(get(x, env = parent.env(environment()))) == address(tbl))]
}

df = data.frame(a = 1:10)
df %>% fn
#[1] "df"
eddi
  • 49,088
  • 6
  • 104
  • 155
0

Inspired by the link mentioned by gersht

You can go back 5 generations to get the name

df %>% {parent.frame(5)$lhs}

example as below:

library(dplyr)

a <- 1

df1 <- data.frame(a = 1:10)

df2 <- data.frame(a = 1:10)

a %>% {parent.frame(5)$lhs}

df1 %>% {parent.frame(5)$lhs}

df2 %>% {parent.frame(5)$lhs}
Kevin Ho
  • 107
  • 1
  • 13
0

Although the question is an old one, and the bounty has already been awarded, I would like to extend on gersht's excellent answer which works perfectly fine for getting the most lefthand-side object name. However, integrating this functionality in a dplyr workflow is not yet solved, apart from using this approach in the very last step of a pipe.

Since I'm using dplyr a lot, I have created a group of custom wrapper functions around the common dplyr verbs which I call metadplyr (I'm still playing around with the functionality, which is why I haven't uploaded it on github yet).

In essence, those functions create a new class called meta_tbl on top of a tibble and write certain things in the attributes of that object. Applied to the problem of the OP I provide a simple example with filter, but the procedure works on any other dplyr verb as well.

In my original function family I use slightly different names than dplyr, but the approach also works when 'overwriting' the original dplyr verbs.

Below is a new filter function which turns a data frame or tibble into a meta_tbl and writes the original name of the lhs object into the attribute .name. Here I am using a short version of gersht's approach.

library(dplyr)

 filter <- function(.data, ...) {

    if(!("meta_tbl" %in% class(.data))) {

      .data2 <- as_tibble(.data)

      # add new class 'meta_tbl' to data.frame  
      attr(.data2, "class") <- c(attr(.data2, "class"), "meta_tbl")

      # write lhs original name into attributes
      i <- 1
      while(!("chain_parts" %in% ls(envir=parent.frame(i)))) {
        i <- i+1
      }
      attr(.data2, ".name") <- deparse(parent.frame(i)$lhs)

    }

    dplyr::filter(.data2, ...)

}

For convenience it is good to have some helper function to extract the original name from the attributes easily.

.name <- function(.data) {
  if("meta_tbl" %in% class(.data)) {
  attr(.data, ".name")
  } else stop("this function only work on objects of class 'meta_tbl'")

}

Both functions can be used in a workflow in the following way:

mtcars %>% 
  filter(gear == 4) %>% 
  write.csv(paste0(.name(.), ".csv"))

This might be a bad example, since the pipe doesn't continue, but in theory, we could use this pipe including the original name and pipe it in further function calls.

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39
-2

I don't believe this is possible without adding an extra argument to your my_function. When chaining functions with dplyr it automatically converts the df to a tbl_df object, hence the new name "." within the dplyr scope to make the piping simpler.

The following is a very hacky way with dplyr which just adds an addition argument to return the name of the original data.frame

my_function <- function(tbl, orig.df){print(deparse(substitute(orig.df)))}
df %>% my_function(df)
[1] "df"

Note you couldn't just pass the df with your original function because the tbl_df object is automatically passed to all subsequent functions.

cdeterman
  • 19,630
  • 7
  • 76
  • 100
  • I'd argue that this is worse than doing `my_fn=function(tbl, origin.df) print(origin.df); df %>% my_fn("df")`, which is manifestly pointless. – eddi May 05 '15 at 19:15
  • @eddi I completely agree, I only intended to provide what the OP asked for without using an additional package. The entire thing is, as you say, pointless. – cdeterman May 05 '15 at 19:18