4

This question is not a duplicate because my data.frame does not have the same amount of NA values in all Columns and therefore the solution mentioned in that question does not work.

I have a data.frame with a lot of NA values and I would like to delete all cells (important: not rows or columns, cells) that have NA values. The original would look like this:

A  B
1  NA
NA 2
2  NA
NA NA
NA NA
NA 4
3  5

The desired result would look like this:

A  B
1  2
2  4 
3  5

The number of columns would have to stay the same, but it does not matter if the values remain on the same rows. They can just be moved up.

I could image one could delete all cells with the condition NA (maybe with apply) and get the result. Or maybe a simple sorting ?

Thanks.

Update:

A   B   C
1       3
    2   
4       3

    1   2

3       5
        4
    9   
7       1
ItamarG3
  • 4,092
  • 6
  • 31
  • 44
digit
  • 1,513
  • 5
  • 29
  • 49
  • 1
    Try `data.frame(lapply(df1, na.omit))` – akrun Jul 28 '17 at 14:52
  • 6
    does each column have the same number of NA? If not, you'll have a problem bc data.frames require columns of the same length. You could just work with a normal list instead. – talat Jul 28 '17 at 14:55
  • No, there are a lot of NAs and few real values and they are not the same number. – digit Jul 29 '17 at 06:18

2 Answers2

5

The OP has requested to remove the NAs columnwise but has pointed out that there might be different numbers of NA in each column.

This can be solved using data.table in two steps:

library(data.table)
# step 1: coerce to data.table in place, move NAs to the bottom of each column, 
# maintain the original order of non-NA values
result <- data.table(DF)[, lapply(.SD, function(x) x[order(is.na(x))])]
     A  B  C
 1:  1  2  3
 2:  4  1  3
 3:  3  9  2
 4:  7 NA  5
 5: NA NA  4
 6: NA NA  1
 7: NA NA NA
 8: NA NA NA
 9: NA NA NA
10: NA NA NA
# step 2: trim result
# either using Reduce
result[!result[, Reduce(`&`, lapply(.SD, is.na))]]

# or using zoo::na.trim()
zoo::na.trim(result, is.na = "all")
    A  B C
1:  1  2 3
2:  4  1 3
3:  3  9 2
4:  7 NA 5
5: NA NA 4
6: NA NA 1

So, there will be some NAs at the end of each colummn unavoidably because all columns in a data.frame have the same length.

Or, alternatively, only complete rows can be kept using the is.na parameter to na.trim():

zoo::na.trim(result, is.na = "any")
   A B C
1: 1 2 3
2: 4 1 3
3: 3 9 2

An alternative solution

As mentioned before, data.frames and cbind() expect all column vectors to have the same length. Here is an alternative solution without data.table which uses the cbind.fill() function from the rowr package which pads vectors with the fill value until the same length:

setNames(do.call(function(...) rowr::cbind.fill(..., fill = NA), lapply(DF, na.omit)),
         colnames(DF))
   A  B C
1  1  2 3
2  4  1 3
3  3  9 2
4  7 NA 5
5 NA NA 4
6 NA NA 1

Data

As supplied by the OP in the update:

DF <- structure(list(A = c(1L, NA, 4L, NA, NA, NA, 3L, NA, NA, 7L), 
    B = c(NA, 2L, NA, NA, 1L, NA, NA, NA, 9L, NA), C = c(3L, 
    NA, 3L, NA, 2L, NA, 5L, 4L, NA, 1L)), .Names = c("A", "B", 
"C"), row.names = c(NA, -10L), class = "data.frame")
Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134
2

You can try ,Base on @UweBlock's Data and your updated question.

dat=as.data.frame(na.omit(apply(dat,2,function (x) x[order(is.na(x))])))

dat
  A B C
1 1 2 3
2 4 1 3
3 3 9 2
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Thanks for your help. I get an empty variable/file after using this . Does the initial dat need to be converted to anything specific (data.frame, list, matrix) ? Or what am I missing here. – digit Jul 29 '17 at 06:17
  • @user413734 I assuming you are working with data.frame – BENY Jul 29 '17 at 14:16
  • Yes I do. After I run your code I get chr[0, 1:255] in the environment panel of Rstudio. I have 255 columns in my original data. All transformations before that show the same number of columns and rows. It seems this code gets rid of all rows where NA is and all occurrences of NA in a column. Any ideas what I could try else. Thank you for your help. – digit Jul 31 '17 at 06:20
  • I just tried your code with the updated example file and even though it did in principle what was expected it left out the the values 7 in row A and 5, 4, 1 in row C. I believe it has to be the same amount of values (3). But for me it does not matter if the rows are kept intakt to the original file. – digit Jul 31 '17 at 07:04
  • @user413734, Then , you can just using `apply(dat,2,function (x) x[order(is.na(x))])` – BENY Jul 31 '17 at 13:51
  • @Wen Please, can you mention that your answer turns the data.frame silently into a matrix? – Uwe Jul 31 '17 at 14:51
  • @UweBlock The first edit of the question , Op mention it is matrix, If it is datad.frame , i will recommend akrun's answer – BENY Jul 31 '17 at 15:22
  • @wen With the updated sample data with varying number of `NA`s, Akrun's answer will stop with an errror messages. According to the edit history, the the OP mentioned `apply` which may have caused the misunderstanding. – Uwe Jul 31 '17 at 15:59
  • 1
    @UweBlock Read thru the updated question and Edit my answer ~Thank you for your help~ :) – BENY Jul 31 '17 at 16:00