0

I would like to replicate in Pandas the following SQL structure "Update dataframe1 LEFT JON dataframe2 SET dataframe1.column1 = dataframe2.column2 WHERE dataframe1.column3 > X"

I know it is possible to merge the dataframe and then work on the merged columns with with ".where"

However, it doesn't seems to be straighforward as a solution.

df = pd.merge(df1,df2, suffix(a,b))
df['clmn1'] = df['clmn1_b'].where( df[clmn1]>0, df['clmn1_b'])

Is there a better way to reach the goal? Thanks

m.falconelli
  • 61
  • 11
  • Are you joining on a particular index, or on several overlapping columns? – Andy Hayden May 14 '19 at 22:25
  • 1
    I think `map` do work here, or set_index() – BENY May 14 '19 at 22:25
  • i think the same as WeNYoBen. Both `map` and `replace` would work – Andy L. May 14 '19 at 22:38
  • @AndyHayden I refer to a generic approach to adopt, some times there are indexes, in other cases there are not indexes. For the general case let's assume you have a column with client name and country, "client list": [Name, Gender, country] and list of field: [[Jack, M, USA], [Nick, M, UK], [Alphio, F, RU], [Jenny, F, USA]] then you have another table with the "abbreviation": [USA: United states, UK, United Kingdom, etc..] .. The use case I have in mind is to update table "client list" replacing the abbreviation with the long country's name, when the gender is Male (M). – m.falconelli May 15 '19 at 09:41
  • @WeNYoBen How would you combine merge and map? – m.falconelli May 15 '19 at 09:43
  • I found this answer that define a better way (I believe) to replace the ".where" method. [link](https://stackoverflow.com/a/36910033/5806136) using .loc; however apparently is still requiring a first merging of the two DFs and then a work on the new columns – m.falconelli May 15 '19 at 09:56

1 Answers1

1

To use your example from the comments:

In [21]: df
Out[21]:
     Name Gender country
0    Jack      M     USA
1    Nick      M      UK
2  Alphio      F      RU
3   Jenny      F     USA

In [22]: country_map =  {'USA': 'United States', 'UK': 'United Kingdom', 'RU': 'Russia'}

In [23]: df.country.map(country_map)
Out[23]:
0     United States
1    United Kingdom
2            Russia
3     United States
Name: country, dtype: object

To update just the M rows you could use loc and update:

In [24]: df.country.update(df[df.Gender == 'M'].country.map(country_map))

In [25]: df
Out[25]:
     Name Gender         country
0    Jack      M   United States
1    Nick      M  United Kingdom
2  Alphio      F              RU
3   Jenny      F             USA
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535