I have 10 text files containing 1000's of rows.
Example: First File:
V1 V2
1 2
2 3
10 20
1 4
.....
Second file:
V1 V2
1 2
8 10
.....
What I want is a final file which will contain 12 columns. The first two columns representing the relationship and the next 10 columns telling about different files ( represented by 1 if that pair is present and 0 if not present ) For example:
Final File:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1 2 1 1 0 0 1 0 1 1 0 1
2 3 1 0 1 1 0 0 0 1 1 0
Now, after searching, I found,
# Create dummy data
L <- replicate(5, expand.grid(1:10, 1:10)[sample(100, 10), ], simplify=FALSE)
# Add a column to each data.frame in L.
# This will indicate presence of the pair when we merge.
L <- lapply(seq_along(L), function(i) {
L[[i]][, paste0('DF', i)] <- 1
L[[i]]
})
# Merge all the things
# (hat-tip to @Charles - http://stackoverflow.com/a/8097519/489704)
L.merged <- Reduce(function(...) merge(..., all=T), L)
head(L.merged)
# Var1 Var2 DF1 DF2 DF3 DF4 DF5
# 1 1 2 NA NA NA 1 NA
# 2 1 5 1 NA NA NA 1
# 3 1 9 NA NA NA 1 NA
# 4 1 10 NA NA 1 1 NA
# 5 2 5 NA NA 1 NA NA
# 6 2 6 NA 1 NA 1 NA
However, the problem is that, this is not showing those rows which have all NA's. For eg:
# 7 4 5 NA NA NA NA NA