Delete series value from row of a pandas data frame based on another data frame value

Question

My question is little bit different than the question posted here

So I thought to open a new thread.I have a pandas data frame with 5 attributes.One of these attribute is created using pandas series.Here is the sample code for creating the data frame

import numpy as np
mydf1=pd.DataFrame(columns=['group','id','name','mail','gender'])
data = np.array([2540948, 2540955, 2540956,2540956,7138932])
x=pd.Series(data)    
mydf1.loc[0]=[1,x,'abc','abc@xyz.com','male']

I have another data frame,the code for creating the data frame is given below

mydf2=pd.DataFrame(columns=['group','id'])
data1 = np.array([2540948, 2540955, 2540956])
y=pd.Series(data1)
mydf2.loc[0]=[1,y]

These are sample data. Actual data will have large number of rows & also the series length is large too .I want to match mydf1 with mydf2 & if it matches,sometime I wont have matching element in mydf2,then I will delete values of id from mydf1 which are there in mydf2 for example after the run,my id will be for group 1 2540956,7138932. I also tried the code mentioned in above link. But for the first line

counts = mydf1.groupby('id').cumcount()

I got error message as TypeError: 'Series' objects are mutable, thus they cannot be hashed in my Python 3.X. Can you please suggest me how to solve this?

I need it very urgently.I will be glad if someone of you can suggest me a solution — Tanvi Mirza, Jan 14 '18 at 16:19
Can you have more data...I cannot tell what you want from your description. — Tai, Jan 14 '18 at 16:53
How to you match? What's the critirion? Do you match by group or by id? — Tai, Jan 14 '18 at 16:54
Hi @Tai I will match by group which is 1 here for both the dataframe. Sorry I don't have more data. But the group contains unique value & id is pandas.series it has large number of values.It's length can be 10 K or more — Tanvi Mirza, Jan 14 '18 at 17:17
id needs to be in order? and you want to remove the first N items? — Tai, Jan 14 '18 at 17:20
@Tai,no need of order for ID.Not first N items. Say id in mydf1 is 1,2,3,4,5,5,7,6,6,8 & in mydf2 is 1,2,5,6,6, then id in c will be 3,4,5,7,8.Please note that the id value will be 8 digit number in original data & it's a pandas series object in ID column.I don't have any data with me currently.I'm expecting a work very soon & for that I'm preparing the code — Tanvi Mirza, Jan 14 '18 at 17:27

Tai · Accepted Answer · 2018-01-14T17:56:42.570

This should work. We use Counter to find the difference between 2 lists of ids. (p.s. This problem does not requires the difference is in order.)

Setup

import numpy as np
from collections import Counter
mydf1=pd.DataFrame(columns=['group','id','name','mail','gender'])
x = [2540948, 2540955, 2540956,2540956,7138932]
y = [2540948, 2540955, 2540956,2540956,7138932]
mydf1.loc[0]=[1,x,'abc','abc@xyz.com','male']
mydf1.loc[1]=[2,y,'def','def@xyz.com','female']

mydf2=pd.DataFrame(columns=['group','id'])
x2 = np.array([2540948, 2540955, 2540956])
y2 = np.array([2540955, 2540956])
mydf2.loc[0]=[1,x2]
mydf2.loc[1]=[2,y2]

Code

mydf3 = mydf1[["group", "id"]]
mydf3 = mydf3.merge(mydf2, how="inner", on="group")

new_id_finder = lambda x: list((Counter(x.id_x) - Counter(x.id_y)).elements())

mydf3["new_id"] = mydf3.apply(new_id_finder, 1)
mydf3["new_id"]
    group   new_id
0   1       [2540956, 7138932]
1   2       [2540948, 2540956, 7138932]

One Counter object can substract another to get the difference in occurances of elements. Then, you can use elements function to retrieve all values left.

@TanviMirza Thanks for letting me know!! – Tai Jan 15 '18 at 06:56 — Tai, Jan 15 '18 at 06:56

Delete series value from row of a pandas data frame based on another data frame value

1 Answers1