Occasionally, I need to clean very messy datasets, which are the result of importing a pdf table to a spreadsheet. When the pdf file is converted, all of the columns remain in the correct order (in relation to each other), but blank columns are scattered arbitrarily between them.
Here is a greatly simplified example.
data <- data.frame(
W = sample(1:10),
X = c("yes","no"," ","yes","no"," "," ","no","yes"," "),
Y = c(" "," "," "," "," ","no","no"," "," ","yes"),
Z = c(" "," ","no"," "," "," "," "," "," "," ")
)
data$X <- gsub(" ", NA, data$X)
data$Y <- gsub(" ", NA, data$Y)
data$Z <- gsub(" ", NA, data$Z)
This results in:
> data
W X Y Z
1 6 yes <NA> <NA>
2 4 no <NA> <NA>
3 3 <NA> <NA> no
4 5 yes <NA> <NA>
5 9 no <NA> <NA>
6 1 <NA> no <NA>
7 7 <NA> no <NA>
8 8 no <NA> <NA>
9 10 yes <NA> <NA>
10 2 <NA> yes <NA>
I want to get this:
W X
1 6 yes
2 4 no
3 3 no
4 5 yes
5 9 no
6 1 no
7 7 no
8 8 no
9 10 yes
10 2 yes
How can I best accomplish this? I need code that can accommodate many versions of this problem, including successive columns with NA values before the column containing the desired values. If I could just remove each individual cell with NA values, while shifting remaining values left, that would work. Is this possible?