Find unique combination of pandas column out of list and another column

Question

I have a DataFrame like this:

df = pd.DataFrame({'number': [['233182801104', '862824274124', '278711320172'], ['072287346459', '278058853506'], ['233182801104', '862824274124'], None, ['123412341234']], 'country':[None, 'France', 'USA', None, 'Germany'], 'c':np.random.randn(5), 'd':np.random.randn(5)})

Which looks like:

                                       number  country         c         d
0  [233182801104, 862824274124, 278711320172]  None     0.177375 -0.226086
1  [072287346459, 278058853506]                France  -0.134511  0.551962
2  [233182801104, 862824274124]                USA      0.490095  0.770992
3  None                                        None    -0.714745  0.807898
4  [123412341234]                                Germany  1.047809  0.523591

I want all unique combinations of elements of lists in the number column and the country. Additional problem is that list can very in length and number and country can contain None:

code     country_final
233182801104     USA
862824274124     USA
278711320172     None
072287346459     France   
278058853506     France   
123412341234     Germany

As a first step I would do this to have seperate columns

a['number'].apply(pd.Series)

After that I am not sure if I have to work with groupby or some kind of pivot table.

score 0 · Answer 1 · answered Jun 16 '19 at 23:11

Try this

data = []
for i in df.itertuples():
    for j in i[1]:
        data.append( (j,i[2]) )

df2 = pd.DataFrame( data, columns =['code' , 'country_final']

OR you can condense it as :

df2 = pd.DataFrame( [ (j,i[2]) for i in df.itertuples() for j in i[1] ], columns =['code' , 'country_final']

score 0 · Answer 2 · answered Jun 17 '19 at 00:07

I am using unnesting with groupby + first

s=unnesting(df.dropna(subset=['number']),['number'])
s=s.mask(s.isnull()).groupby('number').country.first().sort_values().reset_index()
s
         number  country
0  072287346459   France
1  278058853506   France
2  123412341234  Germany
3  233182801104      USA
4  862824274124      USA
5  278711320172      NaN

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')

Find unique combination of pandas column out of list and another column

2 Answers2