2

Context

  • There is a dataframe of customer invoices and their due dates.(Identified by customer code)
  • Week(s) need to be added depending on customer code
  • Model is created to persist the list of customers and week(s) to be added

What is done so far:

Models.py

class BpShift(models.Model):
    bp_name = models.CharField(max_length=50, default='')
    bp_code = models.CharField(max_length=15, primary_key=True, default='')
    weeks = models.IntegerField(default=0)

helper.py

from .models import BpShift
# used in views later
def week_shift(self, df):
        df['DueDateRange'] = df['DueDate'] + datetime.timedelta(
            weeks=BpShift.objects.get(pk=df['BpCode']).weeks)

I realised my understanding of Dataframes is seriously flawed. df['A'] and df['B'] would return Series. Of course, timedelta wouldn't work like this(weeks=BpShift.objects.get(pk=df['BpCode']).weeks).

Dataframe

d = {'BpCode':['customer1','customer2'],'DueDate':['2020-05-30','2020-04-30']}
df = pd.DataFrame(data=d)

Customer List csv

BP Name,BP Code,Week(s)
Customer1,CA0023MY,1
Customer2,CA0064SG,1

Error

BpShift matching query does not exist.

Commentary

I used these methods in hope that I would be able to change the dataframe at once, instead of using df.iterrows(). I have recently been avoiding for loops like a plague and wondering if this is the "correct" mentality. Is there any recommended way of doing this? Thanks in advance for any guidance!

hiayus
  • 35
  • 4

1 Answers1

1

This question Python & Pandas: series to timedelta will help to take you from Series to timedelta. And although

pandas.Series(
    BpShift.objects.filter(
        pk__in=df['BpCode'].tolist()
    ).values_list('weeks', flat=True)
)

will give you a Series of integers, I doubt the order is the same as in df['BpCode']. Because it depends on the django Model and database backend.

So you might be better off to explicitly create not a Series, but a DataFrame with pk and weeks columns so you can use df.join. Something like this

pandas.DataFrame(
    BpShift.objects.filter(
        pk__in=df['BpCode'].tolist()
    ).values_list('pk', 'weeks'),
    columns=['BpCode', 'weeks'],
)

should give you a DataFrame that you can join with.

So combined this should be the gist of your code:

django_response = [('customer1', 1), ('customer2', '2')]

d = {'BpCode':['customer1','customer2'],'DueDate':['2020-05-30','2020-04-30']}
df = pd.DataFrame(data=d).set_index('BpCode').join(
    pd.DataFrame(django_response, columns=['BpCode', 'weeks']).set_index('BpCode')
)
df['DueDate'] = pd.to_datetime(df['DueDate'])
df['weeks'] = pd.to_numeric(df['weeks'])
df['new_duedate'] = df['DueDate'] + df['weeks'] * pd.Timedelta('1W')
print(df)
             DueDate  weeks new_duedate
BpCode                                 
customer1 2020-05-30      1  2020-06-06
customer2 2020-04-30      2  2020-05-14

You were right to want to avoid looping. This approach gets all the data in one SQL query from your Django model, by using filter. Then does a left join with the DataFrame you already have. Casts the dates and weeks to the right types and then computes a new due date using the whole columns instead of loops over them.

NB the left join will give NaN and NaT for customers that don't exist in your Django database. You can either avoid those rows by passing how='inner' to df.join or handle them whatever way you like.

Chris Wesseling
  • 6,226
  • 2
  • 36
  • 72
  • Hi Chris, thanks for the informative reply. Where can I find more information on "pk_in"? Must have missed it in the documentation – hiayus Aug 02 '21 at 08:20
  • Hi Chris, please ignore the above comment, it double underscored pk__in. – hiayus Aug 02 '21 at 08:25
  • @hiayus Oh yeah, sorry for the typo. It's should be mentioned in the doc for `filter`. (I copy pasted the typo to the second codeblock :) fixed the answer. – Chris Wesseling Aug 02 '21 at 09:48
  • @hiayus I've updated the answer a bit to integrate the things.. Is this what you are after? – Chris Wesseling Aug 02 '21 at 11:47
  • Hi Chris, thanks for the edited response! I was lost when the join gave me an object and int64 error. This clears up a lot and is very educational. :D Leaving this here just in case others would like to understand the error as well! https://stackoverflow.com/questions/57795399/trouble-with-df-join-valueerror-you-are-trying-to-merge-on-object-and-int64 – hiayus Aug 03 '21 at 07:16