15

What is the equivalent of drop_duplicates() from pandas in polars?

import polars as pl
df = pl.DataFrame({"a":[1,1,2], "b":[2,2,3], "c":[1,2,3]})
df

Output:

shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 2   ┆ 1   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1   ┆ 2   ┆ 2   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 3   ┆ 3   │
└─────┴─────┴─────┘

Code:

df.drop_duplicates(["a", "b"])

Delivers the following error:

AttributeError: drop_duplicates not found

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
keiv.fly
  • 3,343
  • 4
  • 26
  • 45

2 Answers2

26

The right function name is .unique()

import polars as pl
df = pl.DataFrame({"a":[1,1,2], "b":[2,2,3], "c":[1,2,3]})
df.unique(subset=["a","b"])

And this delivers the right output:

shape: (2, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 2   ┆ 1   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 3   ┆ 3   │
└─────┴─────┴─────┘
blackraven
  • 5,284
  • 7
  • 19
  • 45
keiv.fly
  • 3,343
  • 4
  • 26
  • 45
  • 1
    `df.distinct()` can be ran without any parameters. Appears it was only included to answer this questions. Polars has very good docstrings, run `help(df.distinct)` or `help(df.[method])` to find examples and default parameters. More info [Polars Cookbook](https://pola-rs.github.io/polars-book/user-guide/introduction.html) – Jenobi Jun 06 '22 at 20:41
2

It's renamed to .unique()

See their Polars Documentation

Community
  • 1
  • 1
Claus8528
  • 21
  • 2