2

Pandas 1.1.4

MRE:

mre_df = pd.DataFrame({"dates":["2021-05-01", "2021-05-02", "2021-05-03","2021-05-04"],
                      "click":[3,2,3,4],
                      "imps":[123,122,444,443]})
mre_df["dates"] = pd.to_datetime(mre_df["dates"], format="%Y-%m-%d")
mre_df.set_index("dates", inplace=True)
mre_df["ctr"] = mre_df["click"]/mre_df["imps"] 

mre_df:

         click  imps    ctr
dates           
2021-05-01  3   123     0.024390
2021-05-02  2   122     0.016393
2021-05-03  3   444     0.006757
2021-05-04  4   443     0.009029

I want to multiply ctr by a value however only after "2021-05-02" into new column.

This is my take however looking for more stable, clean, and efficient way. EDIT:SettingWithCopyWarning part has been edited thanks to HenryEcker

mre_df["rel_ctr"] = mre_df["ctr"]
mre_df.loc["2021-05-03":, "rel_ctr"] = mre_df.loc["2021-05-03":, "rel_ctr"] * 1.2

outputting

          click imps    ctr     rel_ctr
dates               
2021-05-01  3   123     0.024390    0.024390
2021-05-02  2   122     0.016393    0.016393
2021-05-03  3   444     0.006757    0.008108
2021-05-04  4   443     0.009029    0.010835
haneulkim
  • 4,406
  • 9
  • 38
  • 80
  • 1
    The `SettingWithCopyWarning` is pretty clear, in this case, about how to handle it... "Try using .loc[row_indexer,col_indexer] = value instead" -> `mre_df.loc["2021-05-03":, "rel_ctr"] = mre_df.loc["2021-05-03":, "rel_ctr"] * 1.2` – Henry Ecker Jul 22 '21 at 02:35
  • 1
    @HenryEcker Oh yes, didn't see it before. Thanks this is perfect! – haneulkim Jul 22 '21 at 02:37
  • 1
    Does this answer your question? [How to deal with SettingWithCopyWarning in Pandas](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) – Henry Ecker Jul 22 '21 at 02:38
  • Yes, this solves the problem you've just helped me with however I am still looking for more efficient way, I don't like to copy column to make new column then apply multiplication to subset of rows. Wondering if I can do this w/o copying. – haneulkim Jul 22 '21 at 02:41

2 Answers2

2

Try pandas where:

mre_df.assign(rle_ctr = mre_df.ctr.where(mre_df.index<="2021-05-02", 
                                         mre_df.ctr*1.2)
                )

            click  imps       ctr   rle_ctr
dates                                      
2021-05-01      3   123  0.024390  0.024390
2021-05-02      2   122  0.016393  0.016393
2021-05-03      3   444  0.006757  0.008108
2021-05-04      4   443  0.009029  0.010835
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
0

I don't know if it's any more efficient but you could use apply.

import pandas as pd
import datetime

mre_df = pd.DataFrame(
    {
        "dates": ["2021-05-01", "2021-05-02", "2021-05-03", "2021-05-04"],
        "click": [3, 2, 3, 4],
        "imps": [123, 122, 444, 443],
    }
)

mre_df["dates"] = pd.to_datetime(mre_df["dates"], format="%Y-%m-%d")
mre_df.set_index("dates", inplace=True)
mre_df["ctr"] = mre_df["click"] / mre_df["imps"]

# create rel_ctr columb
mre_df["rel_ctr"] = mre_df.apply(
    lambda rw: rw["ctr"] * 1.2
    if rw.name >= datetime.datetime.strptime("2021-05-03", "%Y-%m-%d")
    else rw["ctr"],
    axis=1,
)

print(mre_df)

""" Sample Output

            click  imps       ctr   rel_ctr
dates
2021-05-01      3   123  0.024390  0.024390
2021-05-02      2   122  0.016393  0.016393
2021-05-03      3   444  0.006757  0.008108
2021-05-04      4   443  0.009029  0.010835

"""
   
norie
  • 9,609
  • 2
  • 11
  • 18