1

I am very new to Rust so please excuse me if this is a trivial question.

I am trying to filter a dataframe as follows:

    let allowed = Series::from_iter(vec![
        "string1".to_string(),
        "string2".to_string(),
    ]);
    let df = LazyCsvReader::new(&fullpath)
        .has_header(true)
        .finish().unwrap()
        .filter(col("string_id").is_in(&allowed)).collect().unwrap(); 

It looks good to me since the signature of the is_in method looks like this:

fn is_in(
    &self,
    _other: &Series
) -> Result<ChunkedArray<BooleanType>, PolarsError>

from [https://docs.rs/polars/latest/polars/series/trait.SeriesTrait.html#method.is_in]

However, when I compile it I get the following error:

error[E0277]: the trait bound `Expr: From<&polars::prelude::Series>` is not satisfied
    --> src/main.rs:33:40
     |
33   |         .filter(col("string_id").is_in(&allowed)).collect().unwrap();
     |                                  ----- ^^^^^^^^ the trait `From<&polars::prelude::Series>` is not implemented for `Expr`
     |                                  |
     |                                  required by a bound introduced by this call
     |
     = help: the following other types implement trait `From<T>`:
               <Expr as From<&str>>
               <Expr as From<AggExpr>>
               <Expr as From<bool>>
               <Expr as From<f32>>
               <Expr as From<f64>>
               <Expr as From<i32>>
               <Expr as From<i64>>
               <Expr as From<u32>>
               <Expr as From<u64>>
     = note: required for `&polars::prelude::Series` to implement `Into<Expr>`
note: required by a bound in `polars_plan::dsl::<impl Expr>::is_in`
    --> /home/myself/.cargo/registry/src/
     |
1393 |     pub fn is_in<E: Into<Expr>>(self, other: E) -> Self {
     |                     ^^^^^^^^^^ required by this bound in `polars_plan::dsl::<impl Expr>::is_in`

For more information about this error, try `rustc --explain E0277`.

To me this error looks very cryptic. I read the result of rustc --explain E0277 that says "You tried to use a type which doesn't implement some trait in a place which expected that trait", but this doesn't help in the slightest to identify which type doesn't implement which trait.

  • How do I fix this? Why doesn't it work?

NOTE: I know that writing lit(allowed) instead of &allowed works, but this is not possible because it prevents using allowed anywhere else. For example, I would like to do the following, but the following code gets (obviously) an error "use of moved value":

    let df = LazyCsvReader::new(&fullpath)
        .has_header(true)
        .finish().unwrap()
        .with_column(
            when(
                col("firstcolumn").is_in(lit(allowed))
                    .and(
                    col("secondcolumn").is_in(lit(allowed))
                    )
                )
                .then(lit("very good"))
                .otherwise(lit("very bad"))
                .alias("good_bad")
        )
        .collect().unwrap();

Bonus questions:

  • Why does it work with lit(allowed)? Shouldn't I pass the variable by reference as specified in the documentation?
  • How can I repeatedly use a Series for is_in like in the example above without having an error?

EDIT: I found a different signature for is_in requiring the second parameter to be a Expr, this would justify the need to use lit. However, it's still not clear how to use the same Series multiple times without getting the borrowed value error..

  • You could use `cols(["firstcolumn", "secondcolumn"]).is_in(lit(allowed))` to test both columns at the same time which removes the need to re-use the series. – jqurious Jan 19 '23 at 04:24
  • Thanks for the answer, thats useful! Does this perform an implicit AND or OR? That is, is the result true only if both "firstcolumn" and "secondcolumn" are in "allowed" or one is sufficient? – 101001000100001 Jan 19 '23 at 11:52
  • Well it will just return true/false - when passed to `when(...)` - it will be AND. – jqurious Jan 19 '23 at 12:37

1 Answers1

0

The signature is for Series.is_in() but you're using Expr.is_in() which differs.

You can use cols() to select multiple columns:

.with_columns([
    cols(["firstcolumn", "secondcolumn"]).is_in(lit(allowed))
])
┌─────────────┬──────────────┬─────────────┐
│ firstcolumn ┆ secondcolumn ┆ thirdcolumn │
│ ---         ┆ ---          ┆ ---         │
│ bool        ┆ bool         ┆ str         │
╞═════════════╪══════════════╪═════════════╡
│ false       ┆ false        ┆ moo         │
│ true        ┆ false        ┆ foo         │
│ true        ┆ true         ┆ keepme      │
│ true        ┆ true         ┆ andme       │
└─────────────┴──────────────┴─────────────┘

Used inside .when() - there is an implicit AND

┌─────────────┬──────────────┬─────────────┬───────────┐
│ firstcolumn ┆ secondcolumn ┆ thirdcolumn ┆ good_bad  │
│ ---         ┆ ---          ┆ ---         ┆ ---       │
│ str         ┆ str          ┆ str         ┆ str       │
╞═════════════╪══════════════╪═════════════╪═══════════╡
│ a           ┆ b            ┆ moo         ┆ very bad  │
│ string1     ┆ no           ┆ foo         ┆ very bad  │
│ string2     ┆ string1      ┆ keepme      ┆ very good │
│ string1     ┆ string2      ┆ andme       ┆ very good │
└─────────────┴──────────────┴─────────────┴───────────┘

With regards to the moved value error - I have little rust knowledge but the compiler tells me:

help: consider cloning the value if the performance cost is acceptable
   |
15 |                 col("firstcolumn").is_in(lit(allowed.clone())).and(col("secondcolumn").is_in(lit(allowed))))
   |                                                     ++++++++

And cloning a Series is a super cheap operation.

jqurious
  • 9,953
  • 1
  • 4
  • 14