Apply a function from a groupby transform

Question

My pandas looks like this

Date    Ticker  Open    High    Low Adj Close   Adj_Close   Volume
2016-04-18  vws.co  445.0   449.2   441.7   447.3   447.3   945300
2016-04-19  vws.co  449.0   455.8   448.3   450.9   450.9   907700
2016-04-20  vws.co  451.0   452.5   435.4   436.6   436.6   1268100
2016-04-21  vws.co  440.1   442.9   428.4   435.5   435.5   1308300
2016-04-22  vws.co  435.5   435.5   435.5   435.5   435.5   0
2016-04-25  vws.co  431.0   436.7   424.4   430.0   430.0   1311700
2016-04-18  nflx    109.9   110.7   106.02  108.4   108.4   27001500
2016-04-19  nflx    99.49   101.37  94.2    94.34   94.34   55623900
2016-04-20  nflx    94.34   96.98   93.14   96.77   96.77   25633600
2016-04-21  nflx    97.31   97.38   94.78   94.98   94.98   19859400
2016-04-22  nflx    94.85   96.69   94.21   95.9    95.9    15786000
2016-04-25  nflx    95.7    95.75   92.8    93.56   93.56   14965500

I have a program that at one of the functions with embedded functions sucessfully runs a groupby.

This line looks like this

df['MA3'] = df.groupby('Ticker').Adj_Close.transform(lambda group: pd.rolling_mean(group, window=3))

Se my initial question and the data-format here:

Select only one value in df col rows in same df for calc results from different val, and calc df only on one ticker at a time

It has now dawned on me that rather than doing the groupby in each embedded function of which I have 5, I would rather have the groupby run in the main program calling the top function, so all the embedded functions could work on the filtered groupby pandas dataframe from only doing the groupby once...

How do I apply my main function with groupby, in order to filter my pandas, so I only work on one ticker (value in col 'Ticker') at a time?

The 'Ticker' col contains 'aapl', 'msft', 'nflx' company identifyers etc, with timeseries data for a time-window.

Thanks a lot Karasinski. This is close to what I want. But I get an errror.

When I run:

def Screener(df_all, group):

    # Copy df_all to df for single ticker operations
    df = df_all.copy()
    def diff_calc(df,ticker):

        df['Difference'] = df['Adj_Close'].diff()
        return df
    df = diff_calc(df, ticker)
    return df_all

for ticker in stocklist:

    df_all[['Difference']] = df_all.groupby('Ticker').Adj_Close.apply(Screener, ticker)

I get this error:

Traceback (most recent call last):

  File "<ipython-input-2-d7c1835f6b2a>", line 1, in <module>
    runfile('C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py', wdir='C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox')

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
    execfile(filename, namespace)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
    exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

  File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 144, in <module>
    df_all[['Difference']] = df_all.groupby('Ticker').Adj_Close.apply(Screener, ticker)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 663, in apply
    return self._python_apply_general(f)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 667, in _python_apply_general
    self.axis)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 1286, in apply
    res = f(group)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 659, in f
    return func(g, *args, **kwargs)

  File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 112, in Screener
    df = diff_calc(df, ticker)

  File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 70, in diff_calc
    df['Difference'] = df['Adj_Close'].diff()

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\series.py", line 514, in __getitem__
    result = self.index.get_value(self, key)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\tseries\index.py", line 1221, in get_value
    raise KeyError(key)

KeyError: 'Adj_Close'

And when I use functools like so

df_all = functools.partial(df_all.groupby('Ticker').Adj_Close.apply(Screener, ticker))

I get the same error as above...

Traceback (most recent call last):

  File "<ipython-input-5-d7c1835f6b2a>", line 1, in <module>
    runfile('C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py', wdir='C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox')

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
    execfile(filename, namespace)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
    exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

  File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 148, in <module>
    df_all = functools.partial(df_all.groupby('Ticker').Adj_Close.apply(Screener, [ticker]))

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 663, in apply
    return self._python_apply_general(f)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 667, in _python_apply_general
    self.axis)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 1286, in apply
    res = f(group)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 659, in f
    return func(g, *args, **kwargs)

  File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 114, in Screener
    df = diff_calc(df, ticker)

  File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 72, in diff_calc
    df['Difference'] = df['Adj_Close'].diff()

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\series.py", line 514, in __getitem__
    result = self.index.get_value(self, key)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-

3.3.5.amd64\lib\site-packages\pandas\tseries\index.py", line 1221, in get_value
        raise KeyError(key)

    KeyError: 'Adj_Close'

Edit from Karasinski's edit from 31/5.

When I run the last suggestion from Karasinski I get this error.

mmm
mmm
nflx
vws.co
Traceback (most recent call last):

  File "<ipython-input-4-d7c1835f6b2a>", line 1, in <module>
    runfile('C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py', wdir='C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox')

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
    execfile(filename, namespace)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
    exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

  File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 173, in <module>
    df_all[['mean', 'max', 'median', 'min']] = df_all.groupby('Ticker').apply(group_func)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 663, in apply
    return self._python_apply_general(f)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 670, in _python_apply_general
    not_indexed_same=mutated)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 2785, in _wrap_applied_output
    not_indexed_same=not_indexed_same)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 1142, in _concat_objects
    result = result.reindex_axis(ax, axis=self.axis)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\frame.py", line 2508, in reindex_axis
    fill_value=fill_value)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\generic.py", line 1841, in reindex_axis
    {axis: [new_index, indexer]}, fill_value=fill_value, copy=copy)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\generic.py", line 1865, in _reindex_with_indexers
    copy=copy)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\internals.py", line 3144, in reindex_indexer
    raise ValueError("cannot reindex from a duplicate axis")

ValueError: cannot reindex from a duplicate axis

I'm not sure what you're asking. Do you want to calculate five other indicators besides rolling mean? And you want to only call `groupby` once rather than five times? — IanS, May 20 '16 at 07:55

John Karasinski · Accepted Answer · 2016-05-31T05:44:26.263

1

From an answer from your previous question we can set up with

import pandas as pd
from StringIO import StringIO

text = """Date   Ticker        Open        High         Low   Adj_Close   Volume
2015-04-09  vws.co  315.000000  316.100000  312.500000  311.520000  1686800
2015-04-10  vws.co  317.000000  319.700000  316.400000  312.700000  1396500
2015-04-13  vws.co  317.900000  321.500000  315.200000  315.850000  1564500
2015-04-14  vws.co  320.000000  322.400000  318.700000  314.870000  1370600
2015-04-15  vws.co  320.000000  321.500000  319.200000  316.150000   945000
2015-04-16  vws.co  319.000000  320.200000  310.400000  307.870000  2236100
2015-04-17  vws.co  309.900000  310.000000  302.500000  299.100000  2711900
2015-04-20  vws.co  303.000000  312.000000  303.000000  306.490000  1629700
2016-03-31     mmm  166.750000  167.500000  166.500000  166.630005  1762800
2016-04-01     mmm  165.630005  167.740005  164.789993  167.529999  1993700
2016-04-04     mmm  167.110001  167.490005  165.919998  166.399994  2022800
2016-04-05     mmm  165.179993  166.550003  164.649994  165.809998  1610300
2016-04-06     mmm  165.339996  167.080002  164.839996  166.809998  2092200
2016-04-07     mmm  165.880005  167.229996  165.250000  167.160004  2721900"""

df = pd.read_csv(StringIO(text), delim_whitespace=1, parse_dates=[0], index_col=0)

You can then make a function which calculates whatever statistics you'd like, such as:

def various_indicators(group):
    mean = pd.rolling_mean(group, window=3)
    max = pd.rolling_max(group, window=3)
    median = pd.rolling_median(group, window=3)
    min = pd.rolling_min(group, window=3)

    return pd.DataFrame({'mean': mean,
                         'max': max, 
                         'median': median, 
                         'min': min})

To assign these new columns to your dataframe, you would then do a groupby and then apply the function by

df[['mean', 'max', 'median', 'min']] = df.groupby('Ticker').Adj_Close.apply(various_indicators)

EDIT

In regards to your further questions in the comments of the answer: To extract additional information from the dataframe, you should instead pass the entire group rather than just the single column.

def group_func(group):
    ticker = group.Ticker.unique()[0]
    adj_close = group.Adj_Close

    return Screener(ticker, adj_close)

def Screener(ticker, adj_close):
    print(ticker)    

    mean = pd.rolling_mean(adj_close, window=3)
    max = pd.rolling_max(adj_close, window=3)
    median = pd.rolling_median(adj_close, window=3)
    min = pd.rolling_min(adj_close, window=3)

    return pd.DataFrame({'mean': mean,
                         'max': max, 
                         'median': median, 
                         'min': min})

You can then assign these columns in a similar way as above

df[['mean', 'max', 'median', 'min']] = df.groupby('Ticker').apply(group_func)

edited May 31 '16 at 05:44

answered May 22 '16 at 07:04

John Karasinski

977
7
16

This certainly looks interesting. But please see edit to my question, why it fails... Thanks please. – Excaliburst May 26 '16 at 10:24
There are a few things going on in that error. One of the reasons this is failing is that this is the improper way to pass arguments into `apply` functions (see here: http://stackoverflow.com/questions/12182744/python-pandas-apply-a-function-with-arguments-to-a-series). I'm also not clear--why are you making a copy of your DataFrame? The `groupy` should pass in just the group that you want, so you shouldn't have to iterate this many times. – John Karasinski May 27 '16 at 16:13
But as I see it your link to the other question deals with pd.Series not pd.DataFrame as I am handling here... Where do I go wrong? – Excaliburst May 28 '16 at 09:05
Is there a column called `Adj_Closed` in your dataframe? This seems to work from my end, so without seeing some example data from your dataframe I'm not sure why this is failing. – John Karasinski May 28 '16 at 10:20
I have now added an exerpt from my pandas 'df_all' in my question. And yes I have a col named 'Adj_Close'. What seems to work with your df? I cannot get it to run... – Excaliburst May 28 '16 at 13:57
I am no longer clear on what your question is. The above code allows you to apply a function to a groupby object. Are you asking a new question now? – John Karasinski May 28 '16 at 19:01
No - I am still trying to make some form of operation by groupby, so that I can do tickers one at a time. I just added the pandas so you may see the df in my original question. And I tried your the solution in your linked question with functools.partial, but get an error (see question above). – Excaliburst May 29 '16 at 13:39
My above answer _does_ apply a function to one ticker at a time. – John Karasinski May 29 '16 at 23:09
I need 'ticker' as parameter/argument for the function. I don't believe your groupby allows for params. Does it? And I am sorry I haven't mentioned this earlier. But that is why I think functools.partial is the way to go. – Excaliburst May 30 '16 at 05:47
I went ahead and edited my response. I think this should address all your questions. – John Karasinski May 31 '16 at 05:44
Awsome Karasinski, I thank you from the bottom of my heart. I haven't had time to test. But will give a heads-up later. – Excaliburst May 31 '16 at 16:10
Karasinski - thanks. But I find it hard to grasp. How does the 'group' fit into this code. You do not define it anywhere. So it must come from groupby as an object. And you return a function with arguments in group_func. Nice, but also quite confusing... to me at least. Can you maybe talk me and other users through your code? Please. And about the error (reindex from duplicate axis)... Any ideas? – Excaliburst Jun 01 '16 at 09:02
Please read the documentation for `groupby` http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html – John Karasinski Jun 01 '16 at 19:29

Apply a function from a groupby transform

1 Answers1