0

I would like to automatically rename dataframes if they fulfill certain conditions. I have two question about this.

  1. In the code below, the rm part fails, but I do not understand why.
  2. I am wondering if there is a faster/better way to do this (for example by first putting the df's in a list, renaming and unlisting).

Example data:

df_a <- data.frame(
        A = c("a", "b", "c"),
        B = c("a", "b", "c"),
        C = c("a", "b", "c")
        )

df_b <- data.frame(
        same_as_A = c("a", "b", "c"),
        same_as_B = c("a", "b", "c"),
        same_as_C = c("a", "b", "c")
        )

My attempt is the following (where the condition is that more than 2 columns match):

# names of the data
names_of_dataset_X <- c("A", "B", "C") 
names_of_dataset_Y <- c("same_as_A", "same_as_B", "same_as_C") 

dfs <- ls()
for (i in seq_along(dfs)) {
  if (  sum( names( get( dfs[i] ) ) %in% names_of_dataset_X) > 2) {
            dataset_X <- copy(get( dfs[i] ))
            rm(get( dfs[i] ))
  } else if (TRUE) {
     dataset_Y <- copy(get( dfs[i] ))
     rm(get( dfs[i] ))
  }
}
Tom
  • 2,173
  • 1
  • 17
  • 44
  • 1
    One cannot `rm` an object, one must `rm` the *name* of the object. Actually, you're removing the named reference to the object, after which the object will be garbage-collected (eventually). Perhaps you mean `rm(list=dfs[i])`? Said differently, `rm(data.table(a=1))` cannot work, yet that is what `rm(get(...))` is in effect attempting. – r2evans Apr 20 '22 at 13:50
  • 1
    @r2evans Thank you very much for your explanation! I tried `rm(list=dfs[i])` but it removes `names_of_dataset_X` along the way. Something I did not really think through. I should probably just store the old name in a string and feed it to `rm`. But then I have to find out what the old name is.. Which was the problem in the first place I guess haha – Tom Apr 20 '22 at 14:04
  • Most likely, yes. BTW, `copy` is required if you intend `dataset_Y` to be completely separate from the target of `dfs[i]`, so changes to one is not reflected in the other. Since your intent is to remove the object pointed to by `dfs[i]`, you should not care about this, so the call to `copy` is unnecessary (and, with large objects, may take a noticeable amount of time). Observation: moving names around in what I think you're doing here *suggests* that you have functions that require certain names be present; *that is really bad practice*, and I strongly urge you to find a different path. – r2evans Apr 20 '22 at 14:08
  • I am trying to prevent custom naming conventions for Excel files, by different people (who do not code), from affecting my code. So I thought of more or less checking the file content instead of the naming convention. If there is a better practice for this, I would be very interested. – Tom Apr 20 '22 at 14:14
  • Can you be more specific in what your data (differences) look like and what your output should be. Your sample data does not help a thing, all columns in both have the same data, and even you name them same_as_A, by names none are the same and by data all are the same. If you can provide a better sample data and your desired output (fixed) would help. There are plenty of ways usually to fix sloppy named columns (till a certain extend). Just need to know your situation. – Merijn van Tilborg Apr 20 '22 at 14:42
  • Tom, while I commiserate needing to facilitate for non-coders, it's a bit difficult to advise not knowing the larger extent. For me, for instance, I tend to export functions that take data (in the "one or more" sense using `...`), and either infer object names (NSE, perhaps with `deparse(substitute(.))` or not care about names and work with frames/tables directly. Depending on the other users, if they prefer tools other than R, then I often shift to `shiny` or `plumber` to keep their use of R tools strictly under my control. – r2evans Apr 20 '22 at 14:51
  • 1
    @r2evans Thanks, I do understand. Thank you for your attempt to explain! – Tom Apr 20 '22 at 14:55

0 Answers0