-1

I need to use the index location of a timeseries in a lambda function. The lambda function needs to use the location of the index in the transformation. Similar to the question raised in this question: Can I use index information inside the map function? but using a pandas dataframe with DateTime as index.

The equation I am looking to get out of the lambda function is:

position of value in index of timeseries x (1/ length of timeseries) + value

The point of this function is to add a linear trend to the timeseries. The output I expect is an increase of +1 by the end of the timeseries relative to the first time step.

My thought so far has been to use a combination of the enumerate and get_loc functions like so:

dates = pd.date_range(start='2018-10-01', end='2019-09-30', freq='D')
df = pd.DataFrame(np.random.randint(0,100,size=(365, 4)), columns=list('ABCD'), index=dates)

a = df['A']
test = map(lambda (idx, val): df.index.get_loc(idx) * (1/len(df.index)) + val, enumerate(a))

I get the following error message:

File "<ipython-input-6-8fb927ed0ecd>", line 8
test = map(lambda (idx, val): df.index.get_loc(idx) * (1/len(df.index)) + val, enumerate(a))
                  ^
SyntaxError: invalid syntax
AlexaB
  • 115
  • 1
  • 1
  • 6
  • Hi AlexaB, do you mind to share a [mcve](/help/mcve)? – rpanai Sep 04 '19 at 19:29
  • I have updated with what I hope is an mcve! – AlexaB Sep 04 '19 at 21:16
  • can you post the expected output? Do you want to calculate that formula for every columns? – rpanai Sep 04 '19 at 21:22
  • Hi @AlexaB, was my answer somehow useful? – rpanai Sep 06 '19 at 14:32
  • 1
    Hi rpanai. I had a little trouble following your last line of code (I am new to this!). It was useful in the sense of moving me towards using numpy. in the end I used the below code, which is not dissimilar to yours. a = df['A'] a + (1*np.arange(1, a.shape[0]+1))/a.shape – AlexaB Sep 07 '19 at 15:20
  • the last line serves only to add `x` to every columns of `df`. Given that df shape is `N x 4` while `x`'s shape is `N x 1` i just made 4 copies of `x`. If you found my answer helpful consider to upvote it too. – rpanai Sep 09 '19 at 11:10

1 Answers1

0

IIUC you can first calculate value in index of timeseries x (1/ length of timeseries) and then add the value in df as

import pandas as pd
import numpy as np

dates = pd.date_range(start='2018-10-01', periods=365)

df = pd.DataFrame(np.random.randint(0,100,size=(365, 4)),
                  columns=list('ABCD'), index=dates)
# You can't use index in df as they are datetime
x = np.arange(len(df)) * 1/len(df)
# You need this trick as broadcasting is not working
# In this case
res = np.array([x]*4).T + df.values
rpanai
  • 12,515
  • 2
  • 42
  • 64