I have several data sources that i'm trying to work with - i asked a related question a couple of days ago (click here!
so i have 3 dataframes each with a 'user_id' column that is common across all 3 data frames, but not all dataframes are exactly the same size.
I didn't realize it at first and used pd.concat combine them but they aren't lined up by user_id, and i'm not sure how to accomplish that.
here is some sample data from each, and sample data from the resulting concat (perhaps that is helpful?)
df1:
user_id duration
0 1000 116.830000
1 1001 328.092000
2 1002 259.043333
3 1003 1041.000000
4 1004 327.368750
5 1005 470.220000
6 1006 32.055000
7 1007 496.830000
8 1008 491.103333
9 1009 698.710000
df2:
user_id mb_used
0 1000 1902.000000
1 1001 16088.200000
2 1002 13432.000000
3 1003 27045.000000
4 1004 19544.500000
5 1005 17141.000000
6 1006 17094.000000
7 1007 28770.800000
8 1008 18491.333333
9 1009 23405.125000
df3:
user_id id
0 1000 11.000000
1 1001 41.400000
2 1002 29.333333
3 1003 50.000000
4 1004 22.125000
5 1005 11.000000
6 1006 77.000000
7 1007 51.000000
8 1008 28.000000
9 1011 53.000000
df 4 = pd.concat([df1,df2,df3],axis=1)
df4 result:
user_id duration user_id mb_used user_id id
0 1000.0 116.830000 1000 1902.000000 1000.0 11.000000
1 1001.0 328.092000 1001 16088.200000 1001.0 41.400000
2 1002.0 259.043333 1002 13432.000000 1002.0 29.333333
3 1003.0 1041.000000 1003 27045.000000 1003.0 50.000000
4 1004.0 327.368750 1004 19544.500000 1004.0 22.125000
5 1005.0 470.220000 1005 17141.000000 1005.0 11.000000
6 1006.0 32.055000 1006 17094.000000 1006.0 77.000000
7 1007.0 496.830000 1007 28770.800000 1007.0 51.000000
8 1008.0 491.103333 1008 18491.333333 1008.0 28.000000
**9 1009.0 698.710000 1009 23405.125000 1011.0 53.000000**
is there something i've done wrong or could add to line them by that shared user_id or should i be using a different method? i'll be honest - i started with pd.merge but quickly realized i was in over my head in trying to structure that, but if that is the only way (or the best way) i'll take another crack at it.
thanks in advance for your time, and i apologize for what is likely a lack of proper terminology, i am quite new at python (and programming in general)