Python pandas - How to create separate lists of duplicates and uniques?

Question

I have list of uniques from column ACCOUNTMANAGER and I need to pick sorted duplicates to separate file.

Is it possible to use duplicates or something similar to pick identical column values and save them to separate lists?

Let's say ACCOUNTMANAGER contains list of names ['Jack', 'Jack', 'Dane', 'Jessica', 'Jessica', 'Jessica' ]. I would like to have jack list of all Jacks, Dane list oven if one value and list of Jessicas. How can I do this using uniques and duplicates? Here is my code:

uniques = df['ACCOUNTMANAGER'].unique()
print(uniques)

jezrael · Accepted Answer · 2021-05-10T10:11:37.397

0

You can create dictionary of lists with repeated values is grouping by column and create lists for same column used for grouping:

d = df.groupby('ACCOUNTMANAGER')['ACCOUNTMANAGER'].agg(list).to_dict()

print (d['Jack'])
print (d['Dane'])
print (d['Jessica'])

In python string variables are not recommneded, but possible e.g. by globals:

for n, vals in df.groupby('ACCOUNTMANAGER')['ACCOUNTMANAGER'].agg(list).items():
    globals()[n] =  vals

print (Jessica)

edited May 10 '21 at 10:11

answered May 10 '21 at 09:54

jezrael

822,522
95
1,334
1,252

using lines: d = df.groupby('ACCOUNTMANAGER')['ACCOUNTMANAGER'].agg(uniques).to_dict() print(d) I get error: TypeError: ndarray() missing required argument 'shape' (pos 1) – Levsha May 10 '21 at 10:00
@ErikIlonen - There is `list`, not `uniques` – jezrael May 10 '21 at 10:01
that works nicely but how can I pick whole dataframe rows with it and make sepparate lists? – Levsha May 10 '21 at 10:07
Thanks. Is it possible to rather loop 'ACCOUNTMANAGER' and create separate lists based on the occurrence of list? And I still don't get the whole rows df – Levsha May 10 '21 at 10:32
@ErikIlonen - Not understand, what is reason for it? – jezrael May 10 '21 at 11:03
@ErikIlonen - `And I still don't get the whole rows df ` Can you be more specific? – jezrael May 10 '21 at 11:03
One more thing. This code doesn't print anything for some reason: for n, vals in df.groupby('ACCOUNTMANAGER')['ACCOUNTMANAGER'].agg(list).items(): if vals == uniques[1]: print(vals) – Levsha May 10 '21 at 11:36
@ErikIlonen - so compare by same value `uniques[1]` ? And never print any value? – jezrael May 10 '21 at 11:41
uniques[1] contains equivalent value to 'Jessica' and it prints nothing for some reason – Levsha May 10 '21 at 11:44
@ErikIlonen - because in `vals` are `['Jessica', 'Jessica', 'Jessica']`, need compre with `n` like ` if n == uniques[1]` – jezrael May 10 '21 at 11:46
Do you have any solution to how to get df rows that are equivalent to those group by sets? Let's say all df rows that are 'Jessica'? – Levsha May 10 '21 at 11:55
@ErikIlonen - Use `df[df. ACCOUNTMANAGER == n]` – jezrael May 10 '21 at 11:56
@ErikIlonen - What is error? Because there is using [`boolean indexing`](http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing), so problem something else – jezrael May 10 '21 at 12:10
Traceback (most recent...): File "/Users/erik.ilonen/Desktop/Projekti_csv_data/Kolmas_testiohjelma/All_clientaccounts_with_auto_invest.py", line 106, in print(df[0]) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 3024, in __getitem__ indexer = self.columns.get_loc(key) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc raise KeyError(key) from err KeyError: 0 erik.ilonen@32-MacBook-Air Kolmas_testiohjelma % – Levsha May 10 '21 at 12:14
@ErikIlonen - It means in your code was overwrite original DataFrame, test what return `print (df)` ? – jezrael May 10 '21 at 12:18
df prints correctly but with that line of code I get this error: df = df['ACCOUNTMANAGER' == n] File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 3024, in __getitem__ indexer = self.columns.get_loc(key) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc raise KeyError(key) from err KeyError: False erik.ilonen@32-MacBook-Air Kolmas_testiohjelma % – Levsha May 10 '21 at 12:24
2

That was my bad. I changed df.ACCOUNTMANAGER to 'ACCOUNTMANAGER'. It works fine now. Many thanks jezrael and sorry for the inconvenience. – Levsha May 10 '21 at 12:27

score 0 · Answer 2 · answered May 10 '21 at 10:18

0

If its for only one column, you can do this way as well (using pandas value_counts)

>>> df = pd.DataFrame({'ACCOUNTMANAGER': ['Jack', 'Jack', 'Dane', 'Jessica', 'Jessica', 'Jessica' ]})

>>> df.ACCOUNTMANAGER.value_counts()
Jessica    3
Jack       2
Dane       1
Name: ACCOUNTMANAGER, dtype: int64

>>> for key, value in df.ACCOUNTMANAGER.value_counts().items():
...     print([key] * value)

['Jessica', 'Jessica', 'Jessica']
['Jack', 'Jack']
['Dane']

answered May 10 '21 at 10:18

sam

1,819
1
18
30

Thanks. How can I pick whole rows of dataframe while finding occurences? – Levsha May 10 '21 at 10:51
With in the `for loop` use a conditional filter on `df` like `new_df = df[df. ACCOUNTMANAGER == key]`. This `new_df` is your selected df with list of rows, for that one key. If you are going to need rows for each uniue filter then I recommend using `group by` which very cost efficient method. Look for pandas docs for more info. Otherwise this will do ! – sam May 10 '21 at 10:57
Thanks sam as well – Levsha May 10 '21 at 11:28

Python pandas - How to create separate lists of duplicates and uniques?

2 Answers2