Boolean comparison between dataframe and vector wrong

Question

I simplified the issue to the following situation:

There is a dataframe df (which is a subsection of another where the previous four rows are just NaN):

  R_shunt R_Bosch    R_1    R_2    R_3   R_4   R_5   R_6    R_7   R_8
5   81317  138404 102678 135544 158359 83282 86151 90371 119277 98487
6  128501  118684 101001 102568 169562 78182 72561 85573  70014 95572

and there is another dataframe or vector or list (so far the type doesn't matter, apparently):

comp:

[1]  52398 115826  82691 139825 126657 125659  96578  94017  81740 126819

Now I want to know which cells of df are less than the cells of comp:

df > comp
  R_shunt R_Bosch   R_1  R_2  R_3   R_4   R_5   R_6   R_7   R_8
5    TRUE    TRUE FALSE TRUE TRUE  TRUE  TRUE FALSE  TRUE  TRUE
6    TRUE   FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE

which is somehow wrong. This here is a column-wise comparison, e.g. only the first column of df shall be compared with the first column of comp. For example, R_Bosch in the sixth row should be TRUE. I don't get it.

When I set up comp like this:

comp = c(100000, 100000, 100000, 100000, 100000, 100000, 100000, 100000, 100000, 100000)

and compare, it is correct:

> df> comp
  R_shunt R_Bosch  R_1  R_2  R_3   R_4   R_5   R_6   R_7   R_8
5   FALSE    TRUE TRUE TRUE TRUE FALSE FALSE FALSE  TRUE FALSE
6    TRUE    TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE

Now, when I do it manually but use the aimed values like

comp = c(52398, 115826, 82691, 139825, 126657, 125659, 96578, 94017, 81740, 126819)

> df> comp
  R_shunt R_Bosch   R_1  R_2  R_3   R_4   R_5   R_6   R_7   R_8
5    TRUE    TRUE FALSE TRUE TRUE  TRUE  TRUE FALSE  TRUE  TRUE
6    TRUE   FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE

it is still wrong.

I must systemically misunderstand something.

You're comparing 20 values in a data frame to 10 values in a vector. It's not clear to me how you want to match those 20 values to those 10 values - clearly the top left value the data frame gets compare to `comp[1]`, but what gets compared to `comp[2]`, `df[1, 2]` or `df[2, 1]`? - R makes one assumption, you make the other. — Gregor Thomas, Jun 04 '20 at 12:07
which assumption does R make? In both data frames there are 10 columns, so I thought it would be compared column-wise? To be more concise: In my assumption comp[2] should be compared to df[1, 2] and df[2,2] — Ben, Jun 04 '20 at 12:11
It is what I would call "columnwise" - it goes down the items in each column of `df`, `df[1, 1] > comp[1]`, then still in the first column `df[2, 1] > comp[2]`, etc. You seem to want what I would call "rowwise" - first row of `df` compared to `comp`, then second row of `df` compared to `comp`. But don't say "both data frames" - `comp` is just a vector. If you turn `comp` into a data frame you'll get a nice error message telling you that data frames of different shapes can't be compared: try `df > as.data.frame(t(comp))` — Gregor Thomas, Jun 04 '20 at 12:16
True, "row-wise" makes more sense. So this feature seems to not exist? Or is there any workaround or a package you know, maybe? Otherwise I will do it "column-wise" then :) Yes, the dataframe didn't work. Maybe I extend the vector to a dataframe, then it should work.. — Ben, Jun 04 '20 at 12:24
Where do you want the results? Do you want to replace the existing numbers of make new columns or? — Chuck P, Jun 04 '20 at 12:25

Gregor Thomas · Accepted Answer · 2020-06-04T14:26:21.087

A few options:

## make comp the same shape as the data
## (fine with this example, impractical if you have more rows)
df > rbind(comp, comp)
#   R_shunt R_Bosch  R_1   R_2  R_3   R_4   R_5   R_6   R_7   R_8
# 5    TRUE    TRUE TRUE FALSE TRUE FALSE FALSE FALSE  TRUE FALSE
# 6    TRUE    TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE

## transpose df so the order works, then transpose it back
## (note this results in a matrix)
t((t(df)) > comp)
#   R_shunt R_Bosch  R_1   R_2  R_3   R_4   R_5   R_6   R_7   R_8
# 5    TRUE    TRUE TRUE FALSE TRUE FALSE FALSE FALSE  TRUE FALSE
# 6    TRUE    TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE

## Use Map to iterate in parallel - first column of df vs first item of comp, ...
## (returns a list, so we convert to data frame)
as.data.frame(Map(">", df, comp))
#   R_shunt R_Bosch  R_1   R_2  R_3   R_4   R_5   R_6   R_7   R_8
# 1    TRUE    TRUE TRUE FALSE TRUE FALSE FALSE FALSE  TRUE FALSE
# 2    TRUE    TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE

https://stackoverflow.com/questions/8753531/repeat-rows-of-a-data-frame-n-times — Ben, Jun 04 '20 at 13:01

score 2 · Answer 2 · answered Jun 04 '20 at 12:35

2

When you compare a data.frame and a vector, elements in the vector will be recycled to match the dimension of the data.frame. The direction of matching is by column. Take a 2x3 data.frame and a vector of length 3 for example:

df

#   V1 V2 V3
# 1  a  b  c
# 2  a  b  c

vec

# [1] d e f

When you use the arithmetic(+,-,*,/) or logical(>,<,==) operators on them, the following rule will be taken:

df > vec

#       V1     V2     V3
# 1  a > d  b > f  c > e
# 2  a > e  b > d  c > f

instead of

#       V1     V2     V3
# 1  a > d  b > e  c > f
# 2  a > d  b > e  c > f

answered Jun 04 '20 at 12:35

Darren Tsai

32,117
5
21
51

1

@Ben this is another way to visualize the columnwise vs rowwise discussion we had in the question comments. – Gregor Thomas Jun 04 '20 at 14:25
I got that, I just wonder about the sytematic rule behind that. For me, it looks like using somewhat like a determinante or so.. Or in other words: How do you exactly know which cell is compared to which cell? – Ben Jun 05 '20 at 08:54

Boolean comparison between dataframe and vector wrong

2 Answers2