Issues filtering dataframe since NA rows creep in

Question

I am trying to filter a dataframe using logical operators on the entire dataframe and somehow NA rows are creeping in the subsetted dataframe. I have gone through Subsetting R data frame results in mysterious NA rows and Subsetting R data frame results in mysterious NA rows but I am unable to find or reach at a solution.

df <- data.frame(number1 = c(1:5,-13,-2,-34,24,33), number2 = c(10:3, -73, -82))
df
df[df>=0 & !is.na(df$number2),]

I am trying to filter so that I have no negative values in any of the rows in my original data frame. I end up getting 18 rows df with multiple NA rows.

I tried using sapply on my df to check if the logical operation works fine. But if I wrap with "which" I get all 18 rows.

sapply(names(df), function(x) df[x]>=0)

My target is to get a df with no negative values in any of the columns.

EDIT: In my case I wouldnt know how many columns the resulting df would have before I filter them. So filter individually on columns with & operator is out of the question. That is exactly why I was trying to apply the logical operator or the entire df

score 1 · Accepted Answer · answered Nov 20 '16 at 08:15

First think you need is to reduce your matrix to a single vector. If you want to generalize this to any number of columns, you could do either

df[Reduce(`&`, lapply(df, `>=`, 0)), ]
#   number1 number2
# 1       1      10
# 2       2       9
# 3       3       8
# 4       4       7
# 5       5       6

OR

df[rowSums(df >= 0) == ncol(df), ]
#   number1 number2
# 1       1      10
# 2       2       9
# 3       3       8
# 4       4       7
# 5       5       6

score 0 · Answer 2 · answered Nov 20 '16 at 08:09

The problem is that you are trying to filter on rows, which means that you should be passing into the selector a 1d logical vector, but instead are passing a 2d logical matrix. Using this matrix as a selector means that the rows don't necessarily get filtered, the values just get removed.

If you do the following, it will get rid of all the rows with negative values without any resulting NAs

df <- data.frame(number1 = c(1:5,-13,-2,-34,24,33), number2 = c(10:3, -73, -82))
df
df[df[,1]>=0 & df[,2]>=0,]

where df[,1]>=0 & df[,2]>=0 returns a 1d logical vector of the rows where both columns are positive.

Thanks. But this works only If I know how many columns I would have in my df. In my case I wouldnt know how many columns the resulting df would have before I filter them. That is exactly why I was trying to apply the logical operator or the entire df — SuperSatya, Nov 20 '16 at 08:13

score 0 · Answer 3 · answered Nov 20 '16 at 08:11

0

Try the following. Use AND between the conditions you want:

> df[df$number1>0 & df$number2>0,]
  number1 number2
1       1      10
2       2       9
3       3       8
4       4       7
5       5       6

answered Nov 20 '16 at 08:11

OmaymaS

1,671
1
14
18

Issues filtering dataframe since NA rows creep in

3 Answers3