What is the equivalent of `DataFrame.drop_duplicates()` from pandas in polars?

Question

What is the equivalent of drop_duplicates() from pandas in polars?

import polars as pl
df = pl.DataFrame({"a":[1,1,2], "b":[2,2,3], "c":[1,2,3]})
df

Output:

shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 2   ┆ 1   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1   ┆ 2   ┆ 2   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 3   ┆ 3   │
└─────┴─────┴─────┘

Code:

df.drop_duplicates(["a", "b"])

Delivers the following error:

AttributeError: drop_duplicates not found

score 26 · Accepted Answer · edited Aug 16 '22 at 15:06

26

The right function name is .unique()

import polars as pl
df = pl.DataFrame({"a":[1,1,2], "b":[2,2,3], "c":[1,2,3]})
df.unique(subset=["a","b"])

And this delivers the right output:

shape: (2, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 2   ┆ 1   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 3   ┆ 3   │
└─────┴─────┴─────┘

edited Aug 16 '22 at 15:06

blackraven

5,284
7
19
45

answered Feb 20 '22 at 16:57

keiv.fly

3,343
4
26
45

1

`df.distinct()` can be ran without any parameters. Appears it was only included to answer this questions. Polars has very good docstrings, run `help(df.distinct)` or `help(df.[method])` to find examples and default parameters. More info [Polars Cookbook](https://pola-rs.github.io/polars-book/user-guide/introduction.html) – Jenobi Jun 06 '22 at 20:41

score 2 · Answer 2 · edited Mar 01 '23 at 08:35

2

It's renamed to .unique()

See their Polars Documentation

edited Mar 01 '23 at 08:35

Community

1
1

answered Aug 15 '22 at 09:49

Claus8528

21
2

What is the equivalent of `DataFrame.drop_duplicates()` from pandas in polars?

2 Answers2