0

I am trying to automatize data subsetting from multiple dfs. First I used simple loop to open all desired dfs in my directory:

temp=list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i]))

Which writes down all files as separate data frames with names indicated by list called temp All my datasets share structure and using example it looks like this:

CAR_Brand Fuel_type Year_of_production

CarBrand1   Gas             2014
CarBrand1   Gas             2010
CarBrand1   Gas             2007
CarBrand1   Diesel          2006
CarBrand1   Electric        2013
CarBrand1   Electric        2001
CarBrand2   Electric        2007
CarBrand2   Diesel          2004
CarBrand2   Gas             2009
CarBrand2   Gas             2004
CarBrand2   Electric        2000
CarBrand2   Electric        2001
CarBrand2   Electric        2013
CarBrand2   Diesel          2001
CarBrand2   Diesel          2006
CarBrand2   Gas             2010
CarBrand2   Gas             2002
CarBrand2   Gas             2012
CarBrand2   Electric        2009
CarBrand3   Gas             2013
CarBrand3   Gas             2009
CarBrand3   Gas             2015
CarBrand3   Gas             2007
CarBrand3   Diesel          2000
CarBrand3   Diesel          2013

And lets say they each of df is named like this: Cardf1.csv, Cardf2.csv, Cardf3.csv (.csv was generated by first code and I dont mind it)

I want to subset certain rows based on certain conditions and write them down as new datasets with new names (lets say Cardf1.csv_electric2, Cardf2.csv_electric2 etc.) I realized I can achieve it in very similar way as code I used to open my data. First I created list of names for new files:

for(i in 1:length(temp)){
  newtemp[i]=paste(temp[i],"_gdp",sep="")

Then I wrote very simple custom function to make my loop easier. I wrote it in two versions:

custofun1=function(x){
subset(x, x$Car_Brand %in% "CarBrand2" & x$Fuel_Type %in% "Electric")}

customfun2=function(x){
subset(x, x[,1] %in% "CarBrand2" & x[,2] %in% "Electric")}

And then put it into loop:

  for(i in 1:length(temp)){
  assign(newtemp[i],customfun(temp[i]))}

Notice I wrote customfun because it goes for both. Each one generates different error:

  customfun1:
  Error in x$Car_Brand : $ operator is invalid for atomic vectors

  customfun2:
  Error in x[, 1] : incorrect number of dimensions 

So i believe the problem lays in my method for subsetting data, however I found it to be the most convenient way for subset based on non-numeric variables. Is there any possibility to make it work this way or different approach is required?

Alexandros
  • 331
  • 1
  • 14
  • You should most likely be working with lists of data.frames. Have a read of [this post](http://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames). Gregor's answer there has a number of tips for working with lists of data.frames. You can read them in with something like `myList <- lapply(list.files(pattern="*.csv"), read.csv)`. – lmo May 16 '17 at 12:33
  • 1
    When you are calling the customfun you are only providing it the name of the object, not the actual object. Using get() around temp[i] should grab the actual object to be passed into your function. Try using `assign(newtemp[i],customfun(get(temp[i])))` instead – Matt Jewett May 16 '17 at 14:27

0 Answers0