I am trying to obtain a dataframe which has all the combinations of two individual columns in two different dataframes. My dataframes look like this:
>>>first_df >>>second_df
id test id text
0 1 abc 0 11 uvw
1 2 def 1 22 xyz
2 3 ghi
From this, I was able to obtain the combinations using this approach:
df = pd.DataFrame(list(itertools.product(list(a['test']),list(b['text']))),columns=['test','text'])
>>>df
test text
0 abc uvw
1 abc xyz
2 def uvw
3 def xyz
4 ghi uvw
5 ghi xyz
What I am unable to understand is,how do I get the relevant id columns also into my dataframe so it looks like:
>>>df
id test text kid
0 1 abc uvw 11
1 1 abc xyz 22
2 2 def uvw 11
3 2 def xyz 22
4 3 ghi uvw 11
5 3 ghi xyz 22
I tried making combinations on the id columns separately
df1 =pd.DataFrame(list(itertools.product(list(a['id']),list(a['id']))),columns=['id','id'])
df
id id
0 1 1
1 1 2
2 1 3
3 2 1
4 2 2
5 2 3
6 3 1
7 3 2
8 3 3
df2 =pd.DataFrame(list(itertools.product(list(b['kid']),list(b['kid']))),columns=['kid','kid'])
>>>df2
id kid
0 11 11
1 11 22
2 22 11
3 22 22
Which I then tried to concatenate..This obviously failed
df = pd.concat([df['id'],df2,df1['kid']],axis=1)
>>> df
id test text kid
0 1 abc uvw 11.0
1 1 abc xyz 22.0
2 1 def uvw 11.0
3 2 def xyz 22.0
4 2 ghi uvw NaN
5 2 ghi xyz NaN
6 3 NaN NaN NaN
7 3 NaN NaN NaN
8 3 NaN NaN NaN
I have a feeling that I can maybe solve this using apply
function of dataframes but I just cant figure out how. Any leads would be appreciated. Thank you for reading so much :)