How to extract information from a dataframe name and create a column based on it

Question

Here's some mock data that represents the data I have:

pend4P_17k <- data.frame(x = c(1, 2, 3, 4, 5),
                  var1 = c('a', 'b', 'c', 'd', 'e'),
                  var2 = c(1, 1, 0, 0, 1))
pend5P_17k <- data.frame(x = c(1, 2, 3, 4, 5),
                  var1 = c('a', 'b', 'c', 'd', 'e'),
                  var2 = c(1, 1, 0, 0, 1))

I need to add a column to each data frame that represents the first letter/number code within the dataframe name, so for each dataframe I've been doing the following:

pend4P_17k$Pendant_ID<-"4P"
pend5P_17k$Pendant_ID<-"5P"

However, I have many dataframes to apply this to, so I'd like to create a function that can pull the information out of the dataframe name and apply it to a new column. I have attempted to use regular expressions and pattern matching to create a function, but with no luck (I'm very new to regular expressions).

Using R version 3.5.1, Mac OS X 10.13.6

Is the pattern always 'something followed by **1** digit, **1** uppercase letter, followed by something else'? — markus, Jul 06 '20 at 13:14
@markus The pattern sometimes contains 2 digits, 1 uppercase. For example, some of the dataframe names are "pend12P_17k," etc. — millie0725, Jul 06 '20 at 13:18

score 3 · Accepted Answer · answered Jul 06 '20 at 13:21

3

This seems like a pretty bad idea. It's better to keep your data frames in a list rather than strewn about the global environment. However, if you're insistent it is possible:

add_name_cols <- function()
{
  my_global <- ls(envir = globalenv())
  for(i in my_global)
  if(class(get(i)) == "data.frame" & grepl("pend", i))
  {
    df <- get(i)
    df$Pendant_ID <- gsub("^pend(.{2})_.*$", "\\1", i)
    assign(i, df, envir = globalenv())
  }
}

add_name_cols()

pend4P_17k
#>   x var1 var2 Pendant_ID
#> 1 1    a    1         4P
#> 2 2    b    1         4P
#> 3 3    c    0         4P
#> 4 4    d    0         4P
#> 5 5    e    1         4P

pend5P_17k
#>   x var1 var2 Pendant_ID
#> 1 1    a    1         5P
#> 2 2    b    1         5P
#> 3 3    c    0         5P
#> 4 4    d    0         5P
#> 5 5    e    1         5P

answered Jul 06 '20 at 13:21

Allan Cameron

147,086
7
49
87

Is there a way you'd recommend doing it if they're in a list? I didn't have them in a list for this example, but I do eventually add them into a list later on to apply different functions to them. – millie0725 Jul 06 '20 at 13:23
My solution uses a list if you want ;) – Maël Jul 06 '20 at 13:24
1

@dobsonk2 you can use `lapply` to add the name to each column. The drawback is that you need a vector of the IDs that you want placed in each data frame. If the only place these names currently exist is in the name of global data frames, then you can get them by harvesting the names from the global environment as in my example. – Allan Cameron Jul 06 '20 at 13:28
@grouah your solution is quite creative, but it assumes a very specific structure for the names, will throw an error if there are any "missing" values between the numbered labels, and relies on `eval(parse`, which should not be used if it can be avoided. – Allan Cameron Jul 06 '20 at 13:33
@AllanCameron This function works great for dataframe names with a 1 digit, 1 number identifiers, but some of the dataframes have two digits, such as "pend10P_17k," in which the function no longer applies. I can't seem to figure out how to update the regular expression accordingly. I should have included an example such as this in my original post, so my apologies! – millie0725 Jul 06 '20 at 14:11
1

@dobsonk2 try changing the `gsub` to `gsub("^pend(.{2,3})_.*$", "\\1", i)` – Allan Cameron Jul 06 '20 at 14:13

Maël · Answer 2 · 2020-07-06T13:28:57.197

This will do the trick:

require(dplyr)

f<-function(begin, end){
  ids<-seq(begin,end)
  listdf<-lapply(ids, function(x) eval(parse(text=paste0("pend", x,"P_17k"))))
  names(listdf)<-lapply(ids, function(x) paste0("pend", x,"P_17k"))
  len<-seq(1,length(listdf))
  
  for (i in len){
    listdf[[i]]<-listdf[[i]] %>% mutate(Pendant_ID=paste0(i+3,"P"))
  }
  
  list2env(listdf,.GlobalEnv)
}

Gives the desired output:

> f(4,5)
<environment: R_GlobalEnv>

> pend4P_17k
  x var1 var2 Pendant_ID
1 1    a    1         4P
2 2    b    1         4P
3 3    c    0         4P
4 4    d    0         4P
5 5    e    1         4P

> pend5P_17k
  x var1 var2 Pendant_ID
1 1    a    1         5P
2 2    b    1         5P
3 3    c    0         5P
4 4    d    0         5P
5 5    e    1         5P

score 1 · Answer 3 · answered Jul 06 '20 at 13:34

Using mget and rbindlist:

library(data.table)

m1 <- mtcars[1:2, 1:3]
m2 <- mtcars[3:4, 1:3]

rbindlist(mget(ls(pattern = "^m")), id = "myDF")
#    myDF  mpg cyl disp
# 1:   m1 21.0   6  160
# 2:   m1 21.0   6  160
# 3:   m2 22.8   4  108
# 4:   m2 21.4   6  258

How to extract information from a dataframe name and create a column based on it

3 Answers3

Linked