1

I have a data.frame with mostly NA values and some data. To clean this up I need to just select and copy all NA values (not rows or columns including NAs, but just the values). This seemed like an easy task but no solution I tried so far did work. I want to make clear that I do not want to keep the numbers of rows intact in the resulting variable/file, but rather remove all NA values column by column individually. As if these columns where individual lists and I would sort all the cells to the top that contain information/values and all cell that are NA to the bottom.

Thank you.

Update example:

Col 1   Col 2   Col 3
Bar     NA      But
NA     There    NA
Foo     NA      NA
NA      NA      Not
NA      NA      NA
Here    NA      NA
NA    Better    NA

Desired result:

Col1 Col2 Col3
Bar  There But
Foo  Better Not
Here

I need to keep the columns intact, but within the columns all values should move up. Therefore if I can select all the non NA values and paste them into a new dataframe (or anything else), that contains the same amount of columns but only the values, no NAs.

Hope that makes it clearer. Thank you.

Florian
  • 24,425
  • 4
  • 49
  • 80
digit
  • 1,513
  • 5
  • 29
  • 49
  • Please provide [reproducible example](http://stackoverflow.com/questions/5963269) along with expected output – Sotos Jul 31 '17 at 08:54
  • Dont really understand without a example, but maybe you could loop/apply over all columns and for each step insert the new column into a list? – MLEN Jul 31 '17 at 09:01
  • 1
    Not clear at all to me, but maybe you are looking for `which(is.na(df),arr.ind=TRUE)` (wild guess though; provide an example!) – nicola Jul 31 '17 at 09:05
  • 1
    Your desired output is a bit strange. What happens if columns have each a different number of not-NA values? It's better to have a `list` as a result. You can have it applying the first line suggested in the @Florian answer (`lapply(df,function(x) x[!is.na(x)])`). – nicola Jul 31 '17 at 09:24
  • Jep, I know that it is strange, that is why I did not find an answer yet. See it as a cleaning effort where row numbers don't matter, all the information is in the cell. All I need is to get all the cells with values and keep my 255 rows intact. Any way to achieve this would be great. – digit Jul 31 '17 at 09:31
  • Both of the answers you gave me create a very large list and I don't get the desired result and I can't view it in Rstudio. If I export to xlsx than i get a strange result where every column is in a new sheet. Can I add it to all to one new data.frame ? – digit Jul 31 '17 at 09:44
  • @user413734 have you checked my answer? It seems to work on a small example just fine. – Florian Jul 31 '17 at 10:46
  • So sorry. It works !!! Thanks a million. I thought either df2 or df3. But both combined do the trick. I will check if there are any problems. But this question is answered. – digit Jul 31 '17 at 11:22

1 Answers1

3

If I understand you correctly, this does what you want:

# sample data

df  = data.frame(a=c(1,NA,2),b=c(NA,NA,4))
df2 = lapply(df, function(x) {x[!is.na(x)]})
df3 = sapply(df2, '[', seq(max(sapply(df2,length))))

Input:

   a  b
1  1 NA
2 NA NA
3  2  4

Output 1, as a list of lists:

> df2
$a
[1] 1 2

$b
[1] 4

Output 2, as a dataframe:

> df3
     a  b
[1,] 1  4
[2,] 2 NA

Hope this helps!

Florian
  • 24,425
  • 4
  • 49
  • 80