0

I'm going to explain what my code is supposed to do:

First I import an arbitrary number of dataframes of the same number of columns and somewhat similar names (they're stock tickers).

Second, I create a function that changes the data type of the first column (from factor to date), deletes one column and finally adds an arbitrary number of columns as functions of the other columns (like returns, moving average, etc.)

Finally, since I'm working with plenty of dataframes, I want to apply this said function to all of them through a loop.

Now, as simple as this sounds, I've encountered many issues with the third step.

What I found online was this, to get all names of the dataframes in my environment:

dfs <- ls()[sapply(mget(ls(), .GlobalEnv), is.data.frame)]

Which gives a vector of strings of the dataframes' names. Since these are characters and not the actual dataframe object, I can't loop through them, so I added:

sapply( dfs, function(x) {
  get(x) <- formatear(get(x))
  })

This maybe was a bad idea, since I'm not familiar with the sapply function.

Now, the sapply function I used returns an error::

Error in get(x) <- formatear(get(x)) : 
  no se pudo encontrar la función "get<-"

I read that the get(x) function searches for variables that have the name on the vector "x", so I thought I could use it like that, my bad.

Where formatear() is my function in the second step:

formatear <- function(eq){
  eq$Volume <- NULL
  eq$Date <- as.Date(eq$Date)

  nt <- eq$Adj.Close[1:nrow(eq)-1]
  nt1 <- eq$Adj.Close[2:nrow(eq)]
  eq$return <- percent(c(NA, nt1/nt-1), accuracy = 0.0001)
  return(eq) 
  }

Where it takes a dataframe as a parameter and returns the dataframe transformed in a way that I intend to use.

This works fine when I use it on a single dataframe, after I assign the value of the function to it, which is why I tried to assign it to get(x).

Next I thought of modifying my function, so that I just had to call it in sapply and not have to assign it to anything, using the assign() function, so it would change the variable in the global environment directly. Something like this:

sapply( dfs, function(x) {formatear(get(x))})

By changing formatear() to:

formatear <- function(eq){
  eq$Volume <- NULL
  eq$Date <- as.Date(eq$Date)

  nt <- eq$Adj.Close[1:nrow(eq)-1]
  nt1 <- eq$Adj.Close[2:nrow(eq)]
  eq$return <- percent(c(NA, nt1/nt-1), accuracy = 0.0001)
  assign(deparse(substitute(eq)), eq, envir = globalenv()) #Changed here
  }

I used deparse(substitute(eq)) because assign() only takes string to find the variable in the global environment. I found that piece of code online too. This didn't work, it created new dataframes (with the correct format tho) with weird names:

structure(list(Date = structure(c(16962, 16965, 16966, 16967, #this is one of the names of these new dataframes

It also returns:

In assign(deparse(substitute(eq)), eq, envir = globalenv()) : only the first element is used as variable name

And it didn't even loop through all of the dataframes, just two. So that's the whole story, I don't know what else to do and googling doesn't seem to help. Advice at any problem I explained would be welcomed. Also, maybe it would interesting to know if there's a way to store all the dataframes in my environment into a vector so I can use a for loop? Anyway, thank you in advance.

  • 1
    Ultimately I think this comes down to fighting against R rather than working with it. Thinks would be a lot easier if instead of having a bunch of data.frames floating around in your global environment, if you instead kept related data in a properly named list. See [how to make a list of data.frames](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames). You'll be better of avoiding things like `get`/`mget`/`assign` – MrFlick Jun 11 '21 at 03:55
  • 2
    Have you considered reading your data.frames into a list to start with? You could read them all in with a, e.g., `lapply()` loop. This can be a convenient way to starting tackling these sorts of problems (i.e., lots of similar datasets that you want to do the same thing to). – aosmith Jun 11 '21 at 03:59

1 Answers1

1

Keeping your original formatear function

formatear <- function(eq){
  eq$Volume <- NULL
  eq$Date <- as.Date(eq$Date)
  
  nt <- eq$Adj.Close[1:nrow(eq)-1]
  nt1 <- eq$Adj.Close[2:nrow(eq)]
  eq$return <- percent(c(NA, nt1/nt-1), accuracy = 0.0001)
  return(eq) 
}

You can use mget to get list of dataframes and apply the function with lapply.

clean_list_data <- lapply(mget(dfs), formatear)

clean_list_data should be a list of dataframes in the format that you want. You can access individual dataframes with clean_list_data[[1]], clean_list_data[[2]] and so on. It is easier to manage the data if you keep them in a list like this instead of creating multiple dataframes in the global environment.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213