Pandas: How to use method df.name.unique() dynamically to find all unique column entries?

Question

This isn't exactly a question about how to find all unique entries in a column of a dataframe, since I know how I'd do that:

import pandas as pd

df = pd.read_csv('test.txt',delim_whitespace=True)

for key in list(df.keys()):
    uni = set(df[key])

What this is really about, is how to do it with pandas' own methods/functions dynamically and this strange syntax that I can't understand why anyone would use it:

In [101]: list(df.keys())
Out[101]: ['id_cliente', 'id_ordine', 'data_ordine', 'id_medium']

With these keys, you can find their unique column values with the following syntax:

In [102]: df.id_cliente.unique()
Out[102]: array(['madinside', 'lisbeth19'], dtype=object)

I can't use this method dynamically like in my iteration above, can I? I can only use it, if I find out the keys first and manually type in the df.NAME.unique() statement, right?

Why is this a thing? Is this method exclusively intended for interactive use from the python console? Is there a native pandas.DataFrame method for determining unique values dynamically?

SCool · Answer 1 · 2019-09-20T16:07:47.733

1

Does this work for your df?

unique_stuff = [{col: set(df[col].unique())} for col in df.columns]

edit: actually I don't think you even need a set here. I've removed it below:

unique_stuff  = [{col: df[col].unique().tolist()} for col in df.columns]

edited Sep 20 '19 at 16:07

answered Sep 20 '19 at 16:02

SCool

3,104
4
21
49

Yeah I could do that, it's just that I found it weird that you can access a column of a dataframe by using it's column name *string* as a *code fragment*. Doesn't this mean, that when a DataFrame instance is created, the column names are used to create static variables, where the name of the static variables is also the name of the column? Am I the only one who finds this weird? https://stackoverflow.com/questions/68645/are-static-class-variables-possible-in-python – J.Doe Sep 20 '19 at 16:10

score 1 · Accepted Answer · answered Sep 20 '19 at 16:06

1

You can do it dynamically

df.T.apply(pd.Series.unique,1)

answered Sep 20 '19 at 16:06

BENY

317,841
20
164
234

Thanks. But could you maybe tell me, if the syntax is actually weird or if I'm just weird? Look at my other comment if you want to know what I mean? – J.Doe Sep 20 '19 at 19:47

Pandas: How to use method df.name.unique() dynamically to find all unique column entries?

2 Answers2