3

I have a DataFrame and I want to filter the rows where column During_Cabg OR column During_Pci have a value of 1. Here's what I'm doing:

pci_or_cabg = @chain df begin
    select([:During_Cabg, :During_Pci] .=> ByRow(x -> coalesce.(x, 0)); renamecols=false)
    subset(:During_Cabg => ByRow(==(1)), :During_Pci => ByRow(==(1)))
end

The problem is that this line: ByRow(==(1)), :During_Pci => ByRow(==(1) seems to imply an AND not OR. The result I'm getting is values where BOTH columns are 1 (not what I want).

How to subset a DataFrame with multiple conditions (AND or OR) with multiple columns?

Thank you!

SevenSouls
  • 539
  • 3
  • 12
  • `df[(df.During_Cabg .== 1) .| (df.During_Pci .== 1), :]` works! :-) – Mark Aug 15 '23 at 05:29
  • 1
    Thank you Mark. It does work but I was looking for an idiomatic way to do it in a DataFrames.jl chain – SevenSouls Aug 15 '23 at 05:55
  • you could use `filter([:During_Cabg, :During_Pci] => (c, p) -> (c == 1) | (p == 1), df)` – Mark Aug 15 '23 at 06:06
  • 1
    or `@chain df begin select(:During_Cabg, :During_Pci) subset([:During_Cabg, :During_Pci] => ByRow((c, p) -> (c == 1) | (p == 1))) end ` – Mark Aug 15 '23 at 06:13
  • Thank you Mark. This works and I like to use subset instead of filter to stay consistent with the other DataFrames.jl functions. – SevenSouls Aug 16 '23 at 00:14

1 Answers1

3

In subset AND condition is used. If you want to use OR you need to pass it in a single condition. A general way to do it would be:

subset(df, AsTable(columns) => ByRow(x -> any(predicate, x)))

or

subset(df, columns => ByRow((x...) -> any(predicate, x)))

if you want to apply the same predicate to all columns.

If you want a shorter syntax consider using one of the meta-packages, for example with DataFramesMeta.jl you can write):

@rsubset(df, :During_Cabg == 1 || :During_Pci == 1)

which works nice with chaining.

Bogumił Kamiński
  • 66,844
  • 3
  • 80
  • 107