1

I'm currently developing something and was wondering if the new match statement in python 3.10 would be suited for such a use case, where I have conditional statements.

As input I have a timestamp and a dataframe with dates and values. The goal is to loop over all rows and add the value to the corresponding bin bases on the date. Here, in which bin the value is placed depends on the date in relation with the timestamp. A date within 1 month of the timestamp is place in bin 1 and within 2 months in bin 2 etc...

The code that I have now is as follows:

bins = [0] * 7

for date, value in zip(df.iloc[:,0],df.iloc[:,1]):
    match [date,value]:
        case [date,value] if date < timestamp + pd.Timedelta(1,'m'):
            bins[0] += value
        case [date,value] if date > timestamp + pd.Timedelta(1,'m') and date < timestamp + pd.Timedelta(2,'m'):
            bins[1] += value
        case [date,value] if date > timestamp + pd.Timedelta(2,'m') and date < timestamp + pd.Timedelta(3,'m'):
            bins[2] += value
        case [date,value] if date > timestamp + pd.Timedelta(3,'m') and date < timestamp + pd.Timedelta(4,'m'):
            bins[3] += value
        case [date,value] if date > timestamp + pd.Timedelta(4,'m') and date < timestamp + pd.Timedelta(5,'m'):
            bins[4] += value
        case [date,value] if date > timestamp + pd.Timedelta(5,'m') and date < timestamp + pd.Timedelta(6,'m'):
            bins[5] += value

Correction: originally I stated that this code does not work. It turns out that it actually does. However, I am still wondering if this would be an appropriate use of the match statement.

Jeroen Vermunt
  • 672
  • 1
  • 6
  • 19
  • 2
    Is there any reason why you don't just use ``if``/``elif``? You are not actually matching different patterns anywhere and don't even need the destructuring (unpacking) into separate variables. – MisterMiyagi Mar 08 '22 at 10:41
  • Your code has a fair amount of duplication regardless - perhaps create a function that will calculate the appropriate bin for a given date, then the code is simply `bins[get_bin(date)] += value` – Andrew McClement Mar 08 '22 at 10:43
  • 1
    I would also investigate how to do this with Pandas functions (see, e.g., [here](https://stackoverflow.com/a/45273750/3214872)), iterating over each row with this many tests will be extremely slow (and you're creating a ton of duplicated temporary objects at each iteration too...). – GPhilo Mar 08 '22 at 10:47
  • 1
    The actual task is how to bin pandas time data by minutes, isn't it? The looping and such is just how you *tried* to do this. What do you expect to happen for dates that are *equal* to an offset? – MisterMiyagi Mar 08 '22 at 10:53

2 Answers2

2

I'd say it's not a good use of structural pattern matching because there is no actual structure. You are checking values of the single object, so if/elif chain is a much better, more readable and natural choice.

I've got 2 more issues with the way you wrote it -

  1. you do not consider values that are on the edges of the bins
  2. You are checking same condition twice, even though if you reached some check in match/case you are guaranteed that the previous ones were not matched - so you do not need to do if date > timestamp + pd.Timedelta(1,'m') and... if previous check of if date < timestamp + pd.Timedelta(1,'m') failed you already know that it is not smaller. (There is an edge case of equality but it should be handled somehow anyway)

All in all I think this would be the cleaner solution:

for date, value in zip(df.iloc[:,0],df.iloc[:,1]):

    if date < timestamp + pd.Timedelta(1,'m'):
        bins[0] += value
    elif date < timestamp + pd.Timedelta(2,'m'):
        bins[1] += value
    elif date < timestamp + pd.Timedelta(3,'m'):
        bins[2] += value
    elif date < timestamp + pd.Timedelta(4,'m'):
        bins[3] += value
    elif date < timestamp + pd.Timedelta(5,'m'):
        bins[4] += value
    elif date < timestamp + pd.Timedelta(6,'m'):
        bins[5] += value
    else:
        pass
matszwecja
  • 6,357
  • 2
  • 10
  • 17
  • Thanks for the suggestion. I ended up using this solution as it is the most readable. I'll keep my hands of the match-statement until I actually need to check structures like you said. – Jeroen Vermunt Mar 17 '22 at 16:48
0

This should really be done directly with Pandas functions:

import pandas as pd
from datetime import datetime

timestamp = datetime.now()
bins = [pd.Timestamp(year=1970, month=1, day=1)]+[pd.Timestamp(timestamp)+pd.Timedelta(i, 'm') for i in range(6)]+[pd.Timestamp(year=2100, month=1, day=1)] # plus open bin on the right
n_samples = 1000

data = {
  'date': [pd.to_datetime(timestamp)+pd.Timedelta(i,'s') for i in range(n_samples)],
  'value': list(range(n_samples))
}

df = pd.DataFrame(data)

df['bin'] = pd.cut(df.date, bins, right=False)
df.groupby('bin').value.sum()
GPhilo
  • 18,519
  • 9
  • 63
  • 89