A situation at hand involves several .txt data files saved in a directory. The files have unequal lengths and each file consists of several columns names. The files have an "id" column but the remaining column names are distinct. As an example let's consider the following small scenario, df1 and df2 as the data files in the directory:
df1<-
structure(
list(id = c(1L, 2L, 3L, 4L),
a1=c(10L, 6L, 2L, 8L),
a2 = c(22L, 7L, 5L, 1L),
a3 = c(3L, 12L, 1L, 5L)),
.Names = c("id", "a1", "a2","a3"),
class = "data.frame",
row.names = c(NA,-4L))
df2<-structure(
list(id = c(1L, 2L, 3L),
b1=c(8L, 5L, 4L),
b2 = c(7L, 10L, 11L),
b3 = c(6L, 2L, 1L)),
.Names = c("id", "b1", "b2","b3"),
class = "data.frame",
row.names = c(NA,-3L))
What I intend to do is to subset each data based on some selected column names, say "a1" and "a2" for df1 and "b1" and "b2" for df2.
I tried the following codes:
set(".../")
df1<-read.table("df1.txt", header=T)
df2<-read.table("df2.txt", header=T)
new.df1<-data.frame(df1$a1,df1$a2)
new.df2<-data.frame(df1$b1,df1$b2)
My concern is that this approach is less efficient because there are many data files each with many variables which means I have to repeat the above lines of codes several times. Is there a way to loop through the directory to subset each data based on the relevant column names? Your help is greatly appreciated.