1

I have two dataframes, which I need to merge/join based on a column. When I try to join/merge them, the new columns gives NaN.

Basically, I need to perform Left Join on the dataframes, considering df_user as the dataframe on the Left.

PS: The column on both the dataframes have same datatype.

Please find the dataframes below -

df_user.dtypes

App                       category
Sentiment                     int8
Sentiment_Polarity         float64
Sentiment_Subjectivity     float64

df_play.dtypes
App               category
Category          category
Rating             float64
Reviews            float64
Size               float64
Installs             int64
Type                  int8
Price              float64
Content Rating        int8
Installs_Cat          int8


df_play.head()

    App             Category  Rating    Reviews Size    Installs    Type    Price   Content Installs_Cat
0   SPrapBook   ART_AND_DESIGN  4.1      159       19   10000         0       0        0         9
1   U Launcher  ART_AND_DESIGN  4.5      87510     25   5000000       0       0        0         14
2   Sketch -    ART_AND_DESIGN  4.3      215644    2.8  50000000      0       0        1         16
3   Pixel Dra   ART_AND_DESIGN  4.4      967       5.6  100000        0       0        0         11
4   Paper flo   ART_AND_DESIGN  3.8      167       19   50000         0       0        0         10


df_user.head()


                App           Sentiment     Sentiment_Polarity  Sentiment_Subjectivity
0   10 Best Foods for You         2                1.00              0.533333
1   10 Best Foods for You         2                0.25              0.288462
3   10 Best Foods for You         2                0.40              0.875000
4   10 Best Foods for You         2                1.00              0.300000
5   10 Best Foods for You         2                1.00              0.300000

I tried both the codes below -

result = pd.merge(df_user, df_play, how='left', on='App')
result = df_user.join(df_play.set_index('App'),on='App',how='left',rsuffix='_y')

But all i got was -

App Sentiment   Sentiment_Polarity  Sentiment_Subjectivity  Category    Rating  Reviews Size    Installs    Type    Price   Content Rating  Installs_Cat
0   10 Best Foods for You   2   1.00        0.533333    NaN NaN NaN NaN NaN NaN NaN NaN NaN
1   10 Best Foods for You   2   0.25        0.288462    NaN NaN NaN NaN NaN NaN NaN NaN NaN
2   10 Best Foods for You   2   0.40        0.875000    NaN NaN NaN NaN NaN NaN NaN NaN NaN
3   10 Best Foods for You   2   1.00        0.300000    NaN NaN NaN NaN NaN NaN NaN NaN NaN
4   10 Best Foods for You   2   1.00        0.300000    NaN NaN NaN NaN NaN NaN NaN NaN NaN

Please excuse me for the formatting.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • Does this answer your question? [How to pivot a dataframe](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe) – wwnde Apr 27 '20 at 02:13
  • 1
    No. I am basically trying to perform a left join on both the dataframes, taking df_user as the dataframe on the left. – Abhishek Ray Apr 27 '20 at 02:19
  • Well `result = pd.merge(df_user, df_play, how='left', on='App')` this is correct, but I don't see on the head of `df_play` any input for App == `10 Best Foods for You`, so `NaN` value will be assigned. If the output of `result.Category.any()` is `False` then you have no common `App` between the dataframes. – jcaliz Apr 27 '20 at 04:12
  • 1
    please output the categories as lists and check if there are any whitespace differences. – cs95 Apr 27 '20 at 04:15
  • @jcaliz df_play has more than 9000 rows and df_user has more than 37000 rows, and they do have alot of rows in common, I have already tried merging with R - Studio and it works. – Abhishek Ray Apr 27 '20 at 11:10
  • @cs95 No, there are no whitespaces, that has already been checked. I have merged both the dataframes in R-Studio and MS Excel. It works there! – Abhishek Ray Apr 27 '20 at 11:11
  • Can you provide a working example from subsets of your dfs? – Michael McFarlane Apr 04 '22 at 23:59

0 Answers0