0

I'm following along with the blog below as a python/R novice and having trouble adding a loop statement to the code below. Currently i'm able to get the code run in full, but only outputs the seasonal flag for 1 customer. I would like it to loop and run for all of my customers.

datamovesme.com/2018/07/01/seasonality-python-code

##Here comes the R code piece     
     try:
          seasonal = r(''' 
          fit<-tbats(customerTS, seasonal.periods = 12, use.parallel = TRUE)
          fit$seasonal
          ''')
      except: seasonal = 1
      seasonal_output = seasonal_output.append({'customer_id':customerid, 'seasonal': seasonal}, ignore_index=True)
      print(f' {customerid} | {seasonal} ')
print(seasonal_output)
seasonal_output.to_csv(outfile)

I've tried many combinations of code to get it to loop, too many to list here. The blog shows the existing data frames, and time-series objects that are available to us. I am not sure which one to use and how to pass it to the R code.

Thanks !

David Squires
  • 129
  • 1
  • 3
  • 11
  • Sorry about that, I have fixed indentation. Oh, this is already running for all Customer_id? I actually have proper indentation (if my edit is correct) in the actual code that's been running, but I always just get 1 row output. I see the for loop but didn't realize that applied to the R code section. Is the issue that I'm running it step by step and not as 1 big script? – David Squires Oct 22 '18 at 16:27
  • Did you incorporate the `for` ... `groupby` line? Which runs across all customerid? – Parfait Oct 22 '18 at 16:27
  • I have this line: "for customerid, dataForCustomer in filledIn.groupby(by=['customer_id']):" but I am running it step by step. I am now noticing the indentation, the TRY is part of the loop it looks like? and I just need to run the entire script instead of 1 step at a time? – David Squires Oct 22 '18 at 16:29

1 Answers1

1

The blog link maintains issues:

  1. Code does not properly indent lines as a requirement in Python syntax. Possibly, this is due to website rendering of white space or tabs but this is a disservice to readers as missing an indent changes output.

  2. Code failed to heed the inefficiency issue of appending data frames: Never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying. Instead, since seasonal is one value build a list of dictionaries that you cast into the pd.DataFrame() constructor outside of the loop.

After resolving above issues and running entire code block, your solution should output a data frame across all customerids.

# ... same above assignments ...
outfile = '[put your file path here].csv'
df_list = []

for customerid, dataForCustomer in filledIn.groupby(by=['customer_id']):
    startYear = dataForCustomer.head(1).iloc[0].yr
    startMonth = dataForCustomer.head(1).iloc[0].mnth
    endYear = dataForCustomer.tail(1).iloc[0].yr
    endMonth = dataForCustomer.tail(1).iloc[0].mnth

    #Creating a time series object
    customerTS = stats.ts(dataForCustomer.usage.astype(int),
                          start=base.c(startYear,startMonth),
                          end=base.c(endYear, endMonth), 
                          frequency=12)
    r.assign('customerTS', customerTS)

    ##Here comes the R code piece
    try:
        seasonal = r('''
                        fit<-tbats(customerTS, seasonal.periods = 12, use.parallel = TRUE)
                        fit$seasonal
                     ''')
    except: 
        seasonal = 1

    # APPEND DICTIONARY TO LIST (NOT DATA FRAME)
    df_list.append({'customer_id': customerid, 'seasonal': seasonal})
    print(f' {customerid} | {seasonal} ')

seasonal_output = pd.DataFrame(df_list)
print(seasonal_output)
seasonal_output.to_csv(outfile)
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thanks very much @Parfait !! I actually did find a different version of the blog that had proper indentation: https://www.kristenkehrer.com/seasonality-code – David Squires Oct 22 '18 at 16:56
  • I'm sure this is correct but I will try tonight to see if i can get it working. It makes a lot of sense that I missed the nesting within the original for loop. And thank you for the dictionary suggestion!!!! – David Squires Oct 22 '18 at 16:58
  • But still appends data frame in a loop. If you have many customers, your poor machine will wrestle with heavy quadratic copying! – Parfait Oct 22 '18 at 16:59
  • thank you @parfait- using your code / adjusting the indentation got the code to run for all customers. I sent 2,000 customers through that all have 2 years worth of monthly sales data (including 0 months), but got 0 customers flagged as seasonal, so I think there's still an issue somewhere. – David Squires Oct 23 '18 at 12:22
  • Where does this object `dataForOwner.SENDS` derive as used in `stats.ts()` call? In fact, Python should have raised an `NameError` here. Should it be `dataForCustomer.usage`? – Parfait Oct 23 '18 at 14:52
  • yes exactly @Parfait, i had to fix by swapping in dataforCustomer.usage – David Squires Oct 23 '18 at 15:43
  • Does that solve seasonal issue? And are customer values all 0's or 1's? – Parfait Oct 23 '18 at 15:57
  • No, it did not solve the issue. I had fixed that, then was able to get it to run for the 2000 customers. The values are all actually NULL, not 0 or 1. Should I submit a new help request? – David Squires Oct 23 '18 at 16:11
  • I did submit a new request, to close out this question. This is the link if you are able to take a look. Thank you for all your help ! https://stackoverflow.com/questions/52954983/r-tbats-model-seasonal-customer-flag-no-results – David Squires Oct 23 '18 at 17:47