How to append data to dataframe based on another column?

Question

I would like to append values from one dataframe into another by checking if they contain the same Document Number.

ip_df:

        CardName     DocNum    DocDate      DocTotal    DocNum2    PaidToDate   Balance
0       CompanyA  800100001 2021-03-01  10000.000000  920000000  10000.000000  0.000000
1       CompanyA  800100002 2021-03-01  20000.000000  920000000  20000.000000  0.000000
2       CompanyA  800100003 2021-03-01  30000.000000  920000000  30000.000000  0.000000
3       CompanyA  800100004 2021-03-01  40000.000000  920000000  40000.000000  0.000000
4       CompanyA  800100005 2021-03-01  50000.000000  920000000  50000.000000  0.000000
..      ...        ...        ...           ...        ...           ...       ...
94      CompanyY  800100006 2021-03-01  60000.000000  920000005  60000.000000  0.000000
95      CompanyY  800100007 2021-03-01  70000.000000  920000005  70000.000000  0.000000
96      CompanyY  800100008 2021-03-01  80000.000000  920000005  80000.000000  0.000000
97      CompanyZ  800100009 2021-03-01  90000.000000  920000006  90000.000000  0.000000
98      CompanyZ  800100010 2021-03-01  11000.000000  920000006  11000.000000  0.000000

[99 rows x 7 columns]

ar_df:

        BPCode       Balance Currency    DueDate  BPName  TransId       Ref1      Payment Received  Bank Charge
0     XXXXXXXX  10000.000000      USD 2020-09-29  CompanyA   503378  800100001          0            0
1     XXXXXXXX  20000.000000      USD 2021-03-01  CompanyA   543103  800100002          0            0
2     XXXXXXXX  30000.000000      USD 2021-03-01  CompanyA   543171  800100003          0            0
3     XXXXXXXX  40000.000000      USD 2021-03-01  CompanyA   544205  800100004          0            0
4     XXXXXXXX  50000.000000      USD 2021-03-01  CompanyA   544222  800100005          0            0
...        ...           ...      ...        ...  ...      ...        ...               ...          ...
3763  XXXXXXXX  60000.000000      USD 2021-03-02  CompanyY   548612  800100006          0            0
3764  XXXXXXXX  70000.000000      USD 2021-03-02  CompanyY   547727  800100007          0            0
3765  XXXXXXXX  80000.000000      USD 2021-03-30  CompanyY   553819  800100008          0            0
3766  XXXXXXXX  90000.000000      USD 2021-04-01  CompanyZ   547707  800100009          0            0
3767  XXXXXXXX  11000.000000      USD 2021-04-29  CompanyZ   556102  800100010          0            0

[3768 rows x 9 columns]

I am trying to do this:

for row in ip_df:
if ip_df.row['DocNum'] == ar_df.row['Ref1']:
    ap_df.row['Payment Received'] = ip_df.row['PaidToDate']

But I am not too sure of the correct way to doing this.

I have tried searching around, but mostly, the answers always seem to point towards checking against scalar values only.

What I want to achieve is checking if in a row in ip_df, whether 'DocNum' can be found in the ap_df's 'Ref1' column, and if true, set ap_df's 'Payment Received' row to be a value from ip_df's 'PaidToDate' row.

The end result should look like:

        BPCode       Balance Currency    DueDate  BPName  TransId       Ref1      Payment Received  Bank Charge
0     XXXXXXXX  10000.000000      USD 2020-09-29  CompanyA   503378  800100001          10000.000000 0
1     XXXXXXXX  20000.000000      USD 2021-03-01  CompanyA   543103  800100002          20000.000000 0
2     XXXXXXXX  30000.000000      USD 2021-03-01  CompanyA   543171  800100003          30000.000000 0
3     XXXXXXXX  40000.000000      USD 2021-03-01  CompanyA   544205  800100004          40000.000000 0
4     XXXXXXXX  50000.000000      USD 2021-03-01  CompanyA   544222  800100005          50000.000000 0
...        ...           ...      ...        ...  ...      ...        ...               ...          ...
3763  XXXXXXXX  60000.000000      USD 2021-03-02  CompanyY   548612  800114258          60000.000000 0
3764  XXXXXXXX  70000.000000      USD 2021-03-02  CompanyY   547727  800113975          70000.000000 0
3765  XXXXXXXX  80000.000000      USD 2021-03-30  CompanyY   553819  800115292          80000.000000 0
3766  XXXXXXXX  90000.000000      USD 2021-04-01  CompanyZ   547707  800113957          90000.000000 0
3767  XXXXXXXX  11000.000000      USD 2021-04-29  CompanyZ   556102  800115741          11000.000000 0

[3768 rows x 9 columns]

Thank you for any help in advance!!

`pd.merge`? https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html — Epsi95, Mar 16 '21 at 04:24
Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) — sushanth, Mar 16 '21 at 04:25
Thank you for pointing me to this resource! It most definitely helped me to understand the merge operations. Please hold as I try to implement the ideas given in this resource! — Beelz, Mar 16 '21 at 05:54
While this was not the answer to my question, it answered further questions in my head when dealing with other merge operations. Thank you so much for this awesome resource! — Beelz, Mar 17 '21 at 00:48

Mayank Porwal · Accepted Answer · 2021-03-17T05:17:53.700

2

Use df.merge on a subset of columns with df.rename:

In [220]: ar_df['Ref1'] = ar_df['Ref1'].str.replace('', np.nan).astype(int)
In [221]: ip_df['DocNum'] = ar_df['DocNum'].str.replace('', np.nan).astype(int)

In [222]: ar_df.merge(ip_df[['DocNum', 'PaidToDate']], left_on='Ref1', right_on='DocNum').drop(['Payment Received', 'DocNum'], 1).rename(columns={'PaidToDate':'Payment Received'})
Out[222]: 
     BPCode  Balance Currency     DueDate    BPName  TransId       Ref1  Bank_Charge  Payment_Received
0  XXXXXXXX  10000.0      USD  2020-09-29  CompanyA   503378  800100001            0           10000.0
1  XXXXXXXX  20000.0      USD  2021-03-01  CompanyA   543103  800100002            0           20000.0
2  XXXXXXXX  30000.0      USD  2021-03-01  CompanyA   543171  800100003            0           30000.0
3  XXXXXXXX  40000.0      USD  2021-03-01  CompanyA   544205  800100004            0           40000.0
4  XXXXXXXX  50000.0      USD  2021-03-01  CompanyA   544222  800100005            0           50000.0

edited Mar 17 '21 at 05:17

answered Mar 16 '21 at 04:25

Mayank Porwal

33,470
8
37
58

Hello! This looks great! I kinda understand what you are trying to do. However, I am getting a ValueError: "You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat". The error seems to be flagging out the merge statement. Any ideas on how I can circumvent this? Would casting the object to an integer help? – Beelz Mar 16 '21 at 05:52
Yes. You can cast both values to integer using `astype`. – Mayank Porwal Mar 16 '21 at 12:03
Awesome. I managed to do that and it solved my problem. One extra thing to note is that my columns had empty strings: '', and it caused another error to surface: ValueError: invalid literal for int() with base 10: ''. Mayank, if you could kindly edit your post to handle these two errors, I would gladly take your reply as the answer! :) – Beelz Mar 17 '21 at 00:47
@Beelz I've edited my answer. Please check. – Mayank Porwal Mar 17 '21 at 05:18
Spot on, Mayank! Thank you!! – Beelz Mar 18 '21 at 01:26

How to append data to dataframe based on another column?

1 Answers1