0

I'm trying to convert some VBA scripts into a python script, and I have been having troubles trying to figure some few things out, as the results seem different from what the excel file gives. So I have an example dataframe like this :

|Name    |  A_Date    |
_______________________
|RAHEAL  |  04/30/2020|
|GIFTY   |  05/31/2020|
||ERIC   |  03/16/2020|
|PETER   |  05/01/2020|
|EMMANUEL|  12/15/2019| 
|BABA    |  05/23/2020|

and I would want to achieve this result(VBA script result) :

|Name    |  A_Date    | Sold
__________________________________
|RAHEAL  |  04/30/2020| No
|GIFTY   |  05/31/2020| Yes
||ERIC   |  03/16/2020| No
|PETER   |  05/01/2020| Yes
|EMMANUEL|  12/15/2019| No
|BABA    |  05/23/2020| Yes

By converting this VBA script :

Range("C2").Select
    Selection.Value = _
        "=IF(RC[-1]>=(INT(R2C2)-DAY(INT(R2C2))+1),""Yes"",""No"")"
    Selection.AutoFill Destination:=Range("C2", "C" & Cells(Rows.Count, 1).End(xlUp).Row)
    Range("C1").Value = "Sold"
    ActiveSheet.Columns("C").Copy
    ActiveSheet.Columns("C").PasteSpecial xlPasteValues

Simply :=IF(B2>=(INT($B$2)-DAY(INT($B$2))+1),"Yes","No")

To this Python script:

sales['Sold']=np.where(sales['A_Date']>=(sales['A_Date'] - pd.to_timedelta(sales.A_Date.dt.day, unit='d'))+ timedelta(days=1),'Yes','No') 

But I keep getting a "Yes" throughout.... could anyone help me spot out where I might have made some kind of mistake

Dela
  • 115
  • 2
  • 12
  • Why not try an automated converter? Thats what I'd try first, and then see if it works and maybe clean up the code – wamster Jan 10 '21 at 16:28
  • @wamster can you suggest any of the good ones out there, please ? – Dela Jan 10 '21 at 16:32
  • Sorry I have no experience with this, but a quick google search shows [this](https://sourceforge.net/projects/vb2py/) might fit your bill. The main thing here is to make sure its been recently updated, and this has been. Otherwise if its like a 5 years old it might convert to python2 instead – wamster Jan 10 '21 at 16:39
  • Your VBA is referencing column L but your dataframe only has 2 columns initially. And your VBA fomula doesn't match with your Excel formula. If your Excel formula is correct then your first desired output Sold value is wrong. – QHarr Jan 10 '21 at 19:58

2 Answers2

2
import pandas as pd

df = pd.DataFrame({'Name':['RAHEAL','GIFTY','ERIC','PETER','EMMANUEL','BABA'],
                   'A_Date':['04/30/2020','05/31/2020','03/16/2020',
                             '05/01/2020','12/15/2019','05/23/2020']})

df['A_Date'] = pd.to_datetime(df['A_Date'])
print(df)
df['Sold'] = df['A_Date'] >= df['A_Date'].iloc[0].replace(day=1)
df['Sold'] = df['Sold'].map({True:'Yes', False:'No'})
print(df)

output:

       Name     A_Date
0    RAHEAL 2020-04-30
1     GIFTY 2020-05-31
2      ERIC 2020-03-16
3     PETER 2020-05-01
4  EMMANUEL 2019-12-15
5      BABA 2020-05-23

       Name     A_Date Sold
0    RAHEAL 2020-04-30  Yes
1     GIFTY 2020-05-31  Yes
2      ERIC 2020-03-16   No
3     PETER 2020-05-01  Yes
4  EMMANUEL 2019-12-15   No
5      BABA 2020-05-23  Yes

If I read the formula right - if A_Date value >= 04/01/2020 (i.e. first day of month for date in B2), so RAHEAL should be Yes too

I don't know if you noticed (and if this is intended), but if A_Date value has a fractional part (i.e. time), when you calculate the value for 1st of the month, there is room for error. If the time in B2 is let's say 10:00 AM, when you calculate cut value, it will be 04/1/2020 10:00. Then if you have another value, let's say 04/01/2020 09:00, it will be evaluated as False/No. This is how it works also in your Excel formula.

EDIT (12 Jan 2021): Note, values in column A_Date are of type datetime.datetime or datetime.date. Presumably they are converted when reading the Excel file or explicitly afterwards.

buran
  • 13,682
  • 10
  • 36
  • 61
  • + This more like what I was expecting as output. Do we know data type for A_Date? – QHarr Jan 10 '21 at 20:01
  • 1
    @QHarr, Based on the formula using `DAY` I assume it's a Date in Excel, but in any case I would have converted it (and I did) to `datetime` object in the dataframe. They can always cast it to `str` if prefer. – buran Jan 10 '21 at 20:04
  • When trying I got `TypeError: replace() takes no keyword arguments` – YasserKhalil Jan 12 '21 at 10:59
  • 1
    @YasserKhalil it's difficult to tell what your problem is without seeing your code. What I show is my code and the output. If you can share some code and full traceback. Note that, values of `A_Date` columns are `datetime.datetime` or `datetime.date` objects, not `str`. I use [datetime.datetime.replace()](https://docs.python.org/3/library/datetime.html#datetime.datetime.replace) – buran Jan 12 '21 at 11:19
  • Thanks a lot. I just used the line in the QHarr's code `pd.DataFrame({'Name':['RAHEAL','GIFTY','ERIC','PETER','EMMANUEL','BABA'], 'A_Date':['2020-04-30','2020-05-31','2020-03-16','2020-05-01','2019-12-15','2020-05-23']})`. Is there a way to convert it to be an object instead of str? – YasserKhalil Jan 12 '21 at 11:23
  • 1
    @YasserKhalil, check https://stackoverflow.com/q/28133018/4046632 I edited my answer to explicitly mention it. – buran Jan 12 '21 at 11:25
  • What does this part mean `.iloc[0].replace(day=1)`? I think this deals with the first date and return the first day of the month for that date .. Am I right? – YasserKhalil Jan 12 '21 at 11:42
  • 1
    @YasserKhalil, you are right, `iloc(0)` will return first value (i.e. index 0) from the column (see [this](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html)). The formula they want to convert from excel reference $B$2 (cell B1 is the header). `replace()` - there is link in my previous comment. – buran Jan 12 '21 at 11:45
  • I have played around (as I am niewbie at python stuff) and made it in one line `df['Sold'] = (df['A_Date'] >= df['A_Date'].iloc[0].replace(day=1)).map({True:'Yes', False:'No'})`. Just for fun :) – YasserKhalil Jan 12 '21 at 11:48
  • 1
    @YasserKhalil, yes, it can be done in one line, but it's less readable and readability counts. It will be easier to follow the code in slow motion instead of one-liner. At least that was the reason why I did it on separate lines. – buran Jan 12 '21 at 11:49
1

Very much embarassed I didn't see the simple elegant solution that buran gave +. I did more of a literal translation.

first_date.toordinal() - 693594 is the integer date value for your initial date, current_date.toordinal() - 693594 is the integer date value for the current iteration of the dates column. I apply your cell formula logic to each A_Date row value and output as the corresponding Sold column value.

import pandas as pd
from datetime import datetime

def is_sold(current_date:datetime, first_date:datetime, day_no:int)->str:
    # use of toordinal idea from @rjha94 https://stackoverflow.com/a/47478659
    if current_date.toordinal() - 693594 >= first_date.toordinal() - 693594 - day_no + 1:
        return "Yes" 
    else:
        return "No"

sales = pd.DataFrame({'Name':['RAHEAL','GIFTY','ERIC','PETER','EMMANUEL','BABA'],
                      'A_Date':['2020-04-30','2020-05-31','2020-03-16','2020-05-01','2019-12-15','2020-05-23']})
sales['A_Date'] = pd.to_datetime(sales['A_Date'], errors='coerce')
sales['Sold'] = sales['A_Date'].apply(lambda x:is_sold(x, sales['A_Date'][0],  x.day))
print(sales)
QHarr
  • 83,427
  • 12
  • 54
  • 101