I am trying to automatize data subsetting from multiple dfs. First I used simple loop to open all desired dfs in my directory:
temp=list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i]))
Which writes down all files as separate data frames with names indicated by list called temp
All my datasets share structure and using example it looks like this:
CAR_Brand Fuel_type Year_of_production
CarBrand1 Gas 2014
CarBrand1 Gas 2010
CarBrand1 Gas 2007
CarBrand1 Diesel 2006
CarBrand1 Electric 2013
CarBrand1 Electric 2001
CarBrand2 Electric 2007
CarBrand2 Diesel 2004
CarBrand2 Gas 2009
CarBrand2 Gas 2004
CarBrand2 Electric 2000
CarBrand2 Electric 2001
CarBrand2 Electric 2013
CarBrand2 Diesel 2001
CarBrand2 Diesel 2006
CarBrand2 Gas 2010
CarBrand2 Gas 2002
CarBrand2 Gas 2012
CarBrand2 Electric 2009
CarBrand3 Gas 2013
CarBrand3 Gas 2009
CarBrand3 Gas 2015
CarBrand3 Gas 2007
CarBrand3 Diesel 2000
CarBrand3 Diesel 2013
And lets say they each of df is named like this: Cardf1.csv, Cardf2.csv, Cardf3.csv (.csv was generated by first code and I dont mind it)
I want to subset certain rows based on certain conditions and write them down as new datasets with new names (lets say Cardf1.csv_electric2, Cardf2.csv_electric2 etc.) I realized I can achieve it in very similar way as code I used to open my data. First I created list of names for new files:
for(i in 1:length(temp)){
newtemp[i]=paste(temp[i],"_gdp",sep="")
Then I wrote very simple custom function to make my loop easier. I wrote it in two versions:
custofun1=function(x){
subset(x, x$Car_Brand %in% "CarBrand2" & x$Fuel_Type %in% "Electric")}
customfun2=function(x){
subset(x, x[,1] %in% "CarBrand2" & x[,2] %in% "Electric")}
And then put it into loop:
for(i in 1:length(temp)){
assign(newtemp[i],customfun(temp[i]))}
Notice I wrote customfun
because it goes for both. Each one generates different error:
customfun1:
Error in x$Car_Brand : $ operator is invalid for atomic vectors
customfun2:
Error in x[, 1] : incorrect number of dimensions
So i believe the problem lays in my method for subsetting data, however I found it to be the most convenient way for subset
based on non-numeric variables. Is there any possibility to make it work this way or different approach is required?