1

I am trying to translate SQL query into python/pandas language, that is set new values to certain subset of pandas data frame, based on condition and inner joining of two tables. The original SQL query is:

UPDATE table1
INNER JOIN table2
ON table1.colA = table2.colB
SET table1.colX = table2.colZ
WHERE table1.colX Is Null;

so far I've got pandas merge function like mentioned below, but I'm not 100% sure if I get proper results and if the code is pythonic enough. Current pandas syntax:

table1.loc[table1[colX].isnull()]['colX'] = table1.merge(table2[['colB','colZ']], 
                                                   how='inner', left_on=['colA'], 
                                                   right_on=['colB'])['colZ']

Any ideas on how to optimize this?

Oskar_U
  • 472
  • 4
  • 13
  • Is it possible to post a small example dataset and what your expected output is? – Erfan Nov 24 '19 at 21:04
  • 1
    The current form of the pandas merge in your question does not guarantee that the number of records on the right hand side will equal the number of null records in `table1` (your merge on the right will have the length of `len(table1)` while you're referencing fewer indeces on the left with the `isnull`). You could fix that... but I find it easier to break up this sort of statement into two steps: (1) merge the two dataframes to get the full `colz` into `table1`, then (2) set the null values of `colx` with `colz`, see https://stackoverflow.com/a/30299183/9357244 – chris Nov 24 '19 at 22:42

1 Answers1

2

the one-liner solution you're looking for is:

table1.loc[table1['colX'].isna(),'colX'] = table.merge(table2, how='left', left_on='colA', right_on='colB')['colY']

Since you want update your set on condition where table1.colX Is Null; (SQL), there is no need to stick with inner join, especially if you do want to update actual colX value.

DziubAs
  • 21
  • 1