0

I have about 50 unique data frames with the same column names.

Right now I have something like this:

df1_cleaned <- df1$price
df1_cleaned <- df1$date 
df2_cleaned <- df2$price
df2_cleaned <- df2$date
...

There are 10 columns in the uncleaned data frames, I'm only interested in getting the 3rd and 5th columns of each.

Is there a way to do this without writing out every single data frame?

Cheers

datacookies
  • 13
  • 1
  • 8
  • 1
    Use a `list` to store all your `data.frames`, then loop over them. Or probably more sensibly, collect all your `data.frames` into one big `data.frame` and then just select the 3rd and 5th columns once. Something like this sort of logic - https://stackoverflow.com/questions/19460120/looping-through-list-of-data-frames-in-r or for making one big `data.frame` - https://stackoverflow.com/questions/15162197/combine-rbind-data-frames-and-create-column-with-name-of-original-data-frames – thelatemail Dec 06 '19 at 00:11
  • 1
    When you are executing your 2nd statement you are overwriting the 1st one. – Ronak Shah Dec 06 '19 at 00:13
  • 1
    When @thelatemail suggested a `list` of frames, you might use a `for` loop, or more directly you can use `list_of_frames_updated <- lapply(list_of_frames, my_func)`. – r2evans Dec 06 '19 at 00:17

2 Answers2

2

You can extract multiple columns at once from a dataframe by doing df_cleaned <- df[,c("price","date")]. But for processing 50 different dataframes (all with the same column names), you can use a for loop. Here is a way to do it:

1) define the list of dataframes to be proceed

list_of_df = list.files(pattern = ... , path = ...) # ... stands for argument you have to pass and dependent of your files. check the ?list.files

2) loop over the list of dataframes, clean them and assign them to your environment:

for(i in list_of_df)
{
  df = read.table(i,...) # ... stands for arguments to be passed in this function and dependent of your type of file. Check ?read.table 
  df <- df[,c("price","date")]
  assign(paste0(i,"_cleaned"), df,.GlobalEnv)
}

Advantages: you will have your 50 dataframe cleaned and ready to be used in your environment. Inconvenients: You have 50 dataframes in your environment which can be messy

Alternative: As proposed by @thelatemail is to stored these 50 cleaned dataframes into a list. Like that you will have a single object in your environment filled with your dataframes. To do that, the procedure is about the same:

1) define the list of dataframes to be proceed

list_of_df = list.files(pattern = ... , path = ...) # ... stands for argument you have to pass and dependent of your files. check the ?list.files

2) Create a list object to stored dataframes

final_list = vector("list", length(list_of_df)
names(final_list) = list_of_df

3) loop over your list of dataframes, clean them and add them to the list object:

for(i in 1:length(list_of_df))
{
  df = read.table(list_of_df[i],...) # ... stands for arguments to be passed in this function and dependent of your type of file. Check ?read.table
  df <- df[,c("price","date")]
  final_list[[i]] <- df
}
dc37
  • 15,840
  • 4
  • 15
  • 32
1

Turning @thelatemail's comment into answer. Assuming your 50 dataframes are called df1, df2, df3 and so on. You could use mget to get them in list and select the columns via position

temp <- lapply(mget(paste0("df", 1:50)), `[`, c(1,3))

Or via names

temp <- lapply(mget(paste0("df", 1:50)), `[`, c('price', 'date'))

This approach will return you a list of dataframes, if you need individual dataframes separately, do

list2env(temp, .GlobalEnv)

Or combine all the dataframes together and subset only once.

By position

temp <- do.call(rbind, mget(paste0("df", 1:50)))[c(1, 3)]

Or by name

temp <- do.call(rbind, mget(paste0("df", 1:50)))[c('price', 'date')]

This approach will give you all the data together in one dataframe.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213