This isn't exactly a question about how to find all unique entries in a column of a dataframe, since I know how I'd do that:
import pandas as pd
df = pd.read_csv('test.txt',delim_whitespace=True)
for key in list(df.keys()):
uni = set(df[key])
What this is really about, is how to do it with pandas' own methods/functions dynamically and this strange syntax that I can't understand why anyone would use it:
In [101]: list(df.keys())
Out[101]: ['id_cliente', 'id_ordine', 'data_ordine', 'id_medium']
With these keys, you can find their unique column values with the following syntax:
In [102]: df.id_cliente.unique()
Out[102]: array(['madinside', 'lisbeth19'], dtype=object)
I can't use this method dynamically like in my iteration above, can I? I can only use it, if I find out the keys first and manually type in the df.NAME.unique()
statement, right?
Why is this a thing? Is this method exclusively intended for interactive use from the python console? Is there a native pandas.DataFrame method for determining unique values dynamically?