Search values across all columns in R data frame

Question

Here's a sample data frame.

df = data.frame(company = c('a', 'b', 'c', 'd'),
                 bond = c(0.2, 1, 0.3, 0),
                 equity = c(0.7, 0, 0.5, 1),
                 cash = c(0.1, 0, 0.2, 0))
df

  company bond equity cash
1       a  0.2    0.7  0.1
2       b  1.0    0.0  0.0
3       c  0.3    0.5  0.2
4       d  0.0    1.0  0.0

I need to find companies which have 1.0 in any columns. The expected result should be b and d.

Please provide a solution that works for >20 columns. Solutions like df %>% filter(bond == 1) only works for searching a particular column.

dplyr or data.table solutions are acceptable.

Thanks.

Checking for equality with floats is an error-prone business, fyi. Try looking at `x = (.1+.2)*(10/3)` and then testing `x==1`... — Frank, Jun 29 '16 at 02:59
Fwiw, here are some variations on the question: http://stackoverflow.com/q/28233561/ and http://stackoverflow.com/q/25692392/ — Frank, Jun 29 '16 at 03:06

akrun · Answer 1 · 2016-06-29T03:14:19.107

5

We can also use Reduce with ==

res <- df[Reduce(`+`, lapply(df[-1], `==`, 1))!=0,]
res
#   company bond equity cash
#2       b    1      0    0
#4       d    0      1    0

res$company
#[1] b d
#Levels: a b c d

edited Jun 29 '16 at 03:14

answered Jun 29 '16 at 02:53

akrun

874,273
37
540
662

2

I think the akrun way is `df[!!rowSums(df==1),]` :) – Pierre L Jun 29 '16 at 03:15
@PierreLafortune Yes, you are right, but somebody already posted solution with `rowSums`, so I didn't want to get into a tussle about plagiarism :-) – akrun Jun 29 '16 at 03:16

score 4 · Accepted Answer · answered Jun 29 '16 at 02:44

4

Use rowSums to check the logic data frame should work:

df[rowSums(df[-1] == 1.0) != 0, 'company']
[1] b d
Levels: a b c d

answered Jun 29 '16 at 02:44

Psidom

209,562
33
339
356

Btw, just aside...could you share how you would search for non-numeric columns? – shawnl Jun 29 '16 at 02:47
Do you mean the filter condition is different for each column? I don't think this method will work for that case. Anyway you need to be more specific about what the columns and conditions are. – Psidom Jun 29 '16 at 02:51

score 4 · Answer 3 · answered Jun 29 '16 at 02:50

4

Another option:

df[unique(row(df[-1])[df[-1] == 1L]),]
#  company bond equity cash
#2       b    1      0    0
#4       d    0      1    0

df$company[unique(row(df[-1])[df[-1] == 1L])]
#[1] b d
#Levels: a b c d

answered Jun 29 '16 at 02:50

thelatemail

91,185
12
128
188

1

@PierreLafortune - in this case, yep, but I think the unique is needed if there is a 1 in multiple columns. – thelatemail Jun 29 '16 at 03:40

score 2 · Answer 4 · answered Jun 29 '16 at 02:47

2

var <- df %>% select(bond:cash) %>% names
plyr::ldply(var, function(x) paste("filter(df,", x, "==1)") %>% parse(text=.) %>% eval)
  company bond equity cash
1       b    1      0    0
2       d    0      1    0

answered Jun 29 '16 at 02:47

Adam Quek

6,973
1
17
23

Search values across all columns in R data frame

4 Answers4

Linked