0

I was wondering if it was possible to make a new column in a pandas dataframe that is a list of every value NOT including the value of the row itself. For example, in the df below, I have for the first row in columns 'list' values [b, c], and the value of the row itself, 'a'. Is this possible to do per index?

I have tried this, but it returns a list of all values combined per index:

import pandas as pd 
d = {'index': [1, 1, 1, 2, 2, 3], 'col1': ['a', 'b', 'c', 'd', 'e, f', 'g']}
df = pd.DataFrame(d)
df = df.groupby("index")["col1"].apply(list)

Whereas I am looking for something that retains the all of the rows and produces each list in a new column without the row value included.

enter image description here

Thank you for any help!!

FrankMank1
  • 73
  • 6

1 Answers1

1

We can do explode with groupby create the whole list within each index, then do set sub

df['l']=df.col1.str.split(',')
df['new']=df.explode('l').groupby('index')['l'].agg(list).reindex(df['index']).tolist()
df['List']=(df.new.apply(set)-df['l'].apply(set)).apply(list)
df.loc[~df.List.astype(bool),'List']=df.l
df
   index  col1        l         new     List
0      1     a      [a]   [a, b, c]   [c, b]
1      1     b      [b]   [a, b, c]   [a, c]
2      1     c      [c]   [a, b, c]   [a, b]
3      2     d      [d]  [d, e,  f]  [e,  f]
4      2  e, f  [e,  f]  [d, e,  f]      [d]
5      3     g      [g]         [g]      [g]

Update

l=[]
... for x , y in zip(df.l,df.new):
...     x=x.copy()
...     y=y.copy()
...     for i in x:
...         if i in y:
...             y.remove(i)
...     l.append(y)
... 
l
[['b', 'c'], ['a', 'c'], ['a', 'b'], ['e', ' f'], ['d'], []]
df['List']=l
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Thanks for this great answer! How about if a is repeated in multiple rows and we only want to get rid of one of them, and not all? If we have d = {'index': [1, 1, 1, 2, 2, 3], 'col1': ['a', 'a', 'c', 'd', 'e, f', 'g']}, this means the the list columns will get rid of both of the "a" values, but if there a way to only remove the one from that row? – FrankMank1 May 29 '20 at 00:48
  • So what I'm saying, is it possible to drop the first duplicate only? And for row 4, if we had a repeat "e, f", would we be able to drop only one e, f from [e, f, e, f, d]? – FrankMank1 May 29 '20 at 00:49
  • 1
    @FrankMank1 that is hard , you need do for loop with remove – BENY May 29 '20 at 00:51
  • Sure! Could you also explain why you used x = x.copy() and y = y.copy() instead of the values themselves? – FrankMank1 May 29 '20 at 04:07
  • 1
    @FrankMank1 we need copy since ,if not the output will be empty, since the remove is remove the object from dataframe https://stackoverflow.com/questions/2612802/how-to-clone-or-copy-a-list – BENY May 29 '20 at 13:20