Select only one value in df col rows in same df for calc results from different val, and calc df only on one ticker at a time

Question

I try to calculate some KPIs from different companies/tickers. My stock-info resides in a df, with this structure

            Ticker        Open        High         Low   Adj Close   Volume
Date                                                                       
2015-04-09  vws.co  315.000000  316.100000  312.500000  311.520000  1686800
2015-04-10  vws.co  317.000000  319.700000  316.400000  312.700000  1396500
2015-04-13  vws.co  317.900000  321.500000  315.200000  315.850000  1564500
2015-04-14  vws.co  320.000000  322.400000  318.700000  314.870000  1370600
2015-04-15  vws.co  320.000000  321.500000  319.200000  316.150000   945000
2015-04-16  vws.co  319.000000  320.200000  310.400000  307.870000  2236100
2015-04-17  vws.co  309.900000  310.000000  302.500000  299.100000  2711900
2015-04-20  vws.co  303.000000  312.000000  303.000000  306.490000  1629700
...            ...         ...         ...         ...         ...      ...
2016-03-31     mmm  166.750000  167.500000  166.500000  166.630005  1762800
2016-04-01     mmm  165.630005  167.740005  164.789993  167.529999  1993700
2016-04-04     mmm  167.110001  167.490005  165.919998  166.399994  2022800
2016-04-05     mmm  165.179993  166.550003  164.649994  165.809998  1610300
2016-04-06     mmm  165.339996  167.080002  164.839996  166.809998  2092200
2016-04-07     mmm  165.880005  167.229996  165.250000  167.160004  2721900

I need to do calculations on a per ticker basis, so I do not skew the moving-avg results when traversing companies/tickers.

Could someone please help me with that?

Split-up and concatenating dfs How would I logically do the separation of tickers into maybe different dfs appended with the ticker-name, and afterwards how do I append them all again to one single df?

Selection from same df Or select only the relevant ticker-rows in the same df? (maybe like this Splitting dataframe into multiple dataframes)

Any help is appreciated...

---------- From Alexanders post below. This code

df['MA1'] = df.groupby('Ticker').df['Adj Close'].transform(lambda group: pd.rolling_mean(group, window=10))

Throws this error:

AttributeError: 'DataFrameGroupBy' object has no attribute 'df'

Or a more close cut'n paste

Adj_Close = df['Adj Close']

df['MA3'] = df.groupby('Ticker').Adj_Close.transform(lambda group: pd.rolling_mean(group, window=3))

Does NOT anymore throw an error. It was the naming of the col 'Adj Close' with a space that groupby does not like... This must be a bug

If I try this syntax, that should work - it doesn't!

df['MA3'] = df.groupby('Ticker').df["Adj Close"].transform(lambda group: pd.rolling_mean(group, window=3))

And throws this error:

AttributeError: 'DataFrameGroupBy' object has no attribute 'df'

But I can work with the above. Thanks Alexander

First error is because you erroneously added a df between ticker and Adj Close. Second error is because I renamed Adj Close to Adj_Close because it is then easier to work with. Everywhere you see it in my code, replace it with ['Adj Close'] and you will be fine (be sure to include the square brackets). — Alexander, May 19 '16 at 13:49
Sure Alexander, but making a filter like so `df_ticker = df.loc[df.Ticker == ticker]` would imply doing some form of appending and concatenating. I wanted the groupby functionality to dismiss of this complexity and hold all content in its proper place in the original pandas df. — Excaliburst, May 20 '16 at 09:22

score 1 · Accepted Answer · answered May 17 '16 at 16:43

You can use transform on your groupby object to maintain a column with the same shape:

Here, for example, is the 3 day moving average of the Adj Close (Pandas < 0.18.0).

df['MA3'] = df.groupby('Ticker').Adj_Close.transform(lambda group: pd.rolling_mean(group, window=3))

>>> df
          Date  Ticker  Open  High  Low  Adj_Close   Volume  MA3
0   2015-04-09  vws.co   315   316  312        312  1686800  NaN
1   2015-04-10  vws.co   317   320  316        313  1396500  NaN
2   2015-04-13  vws.co   318   322  315        316  1564500  313
3   2015-04-14  vws.co   320   322  319        315  1370600  314
4   2015-04-15  vws.co   320   322  319        316   945000  316
5   2015-04-16  vws.co   319   320  310        308  2236100  313
6   2015-04-17  vws.co   310   310  302        299  2711900  308
7   2015-04-20  vws.co   303   312  303        306  1629700  304
8   2016-03-31     mmm   167   168  166        167  1762800  NaN
9   2016-04-01     mmm   166   168  165        168  1993700  NaN
10  2016-04-04     mmm   167   167  166        166  2022800  167
11  2016-04-05     mmm   165   167  165        166  1610300  167
12  2016-04-06     mmm   165   167  165        167  2092200  166
13  2016-04-07     mmm   166   167  165        167  2721900  167

And yet again. See my latest comments in the question. Something is not quite right. — Excaliburst, May 19 '16 at 09:52

score 0 · Answer 2 · answered May 17 '16 at 16:23

Use groupby

Setup

import pandas as pd
from StringIO import StringIO

text = """Date   Ticker        Open        High         Low   Adj_Close   Volume
2015-04-09  vws.co  315.000000  316.100000  312.500000  311.520000  1686800
2015-04-10  vws.co  317.000000  319.700000  316.400000  312.700000  1396500
2015-04-13  vws.co  317.900000  321.500000  315.200000  315.850000  1564500
2015-04-14  vws.co  320.000000  322.400000  318.700000  314.870000  1370600
2015-04-15  vws.co  320.000000  321.500000  319.200000  316.150000   945000
2015-04-16  vws.co  319.000000  320.200000  310.400000  307.870000  2236100
2015-04-17  vws.co  309.900000  310.000000  302.500000  299.100000  2711900
2015-04-20  vws.co  303.000000  312.000000  303.000000  306.490000  1629700
2016-03-31     mmm  166.750000  167.500000  166.500000  166.630005  1762800
2016-04-01     mmm  165.630005  167.740005  164.789993  167.529999  1993700
2016-04-04     mmm  167.110001  167.490005  165.919998  166.399994  2022800
2016-04-05     mmm  165.179993  166.550003  164.649994  165.809998  1610300
2016-04-06     mmm  165.339996  167.080002  164.839996  166.809998  2092200
2016-04-07     mmm  165.880005  167.229996  165.250000  167.160004  2721900"""

df = pd.read_csv(StringIO(text), delim_whitespace=1, parse_dates=[0], index_col=0)

Looks like:

print df

            Ticker        Open        High         Low   Adj_Close   Volume
Date                                                                       
2015-04-09  vws.co  315.000000  316.100000  312.500000  311.520000  1686800
2015-04-10  vws.co  317.000000  319.700000  316.400000  312.700000  1396500
2015-04-13  vws.co  317.900000  321.500000  315.200000  315.850000  1564500
2015-04-14  vws.co  320.000000  322.400000  318.700000  314.870000  1370600
2015-04-15  vws.co  320.000000  321.500000  319.200000  316.150000   945000
2015-04-16  vws.co  319.000000  320.200000  310.400000  307.870000  2236100
2015-04-17  vws.co  309.900000  310.000000  302.500000  299.100000  2711900
2015-04-20  vws.co  303.000000  312.000000  303.000000  306.490000  1629700
2016-03-31     mmm  166.750000  167.500000  166.500000  166.630005  1762800
2016-04-01     mmm  165.630005  167.740005  164.789993  167.529999  1993700
2016-04-04     mmm  167.110001  167.490005  165.919998  166.399994  2022800
2016-04-05     mmm  165.179993  166.550003  164.649994  165.809998  1610300
2016-04-06     mmm  165.339996  167.080002  164.839996  166.809998  2092200
2016-04-07     mmm  165.880005  167.229996  165.250000  167.160004  2721900

Solution

df.groupby('Ticker').sum()

           Open         High          Low    Adj_Close    Volume
Ticker                                                          
mmm      995.89  1003.590011   991.949981  1000.339998  12203700
vws.co  2521.80  2543.400000  2497.900000  2484.550000  13541100

You can aggregate and do many things with the groupby object.

OK - I am aware of the groupby function. But I need to separate out, og select only one ticker at a time for calculations. I have no problem with the calculations, but only the selecting only one ticker and do the calc on those, save them to pandas, and do the next ticker. This is to ensure that my calc do not cross from one ticker to another, which obviously does not have anything to do with the first ticker. For instance rolling mean traversing different companies does not work!!! — Excaliburst, May 18 '16 at 07:31
So this is a task of filtering/selection, so I only focus on calculations for one ticker at a time (for ticker in stocklist...) I know how to iterate, but not how to properly filter so I only work with one ticker at a time. — Excaliburst, May 18 '16 at 07:35
@Excaliburst There are many things which can be done with groupby and other methods. I gave a general answer because I though it might be helpful to you despite the fact that your question was and is vague. If you can articulate what you are trying to do more clearly, I can, or others can be more helpful. As per your original question: "How do I calculate on a per ticker basis". that is answered by using a groupby. If you want to calculate per ticker and per date, then that is different and can also be addressed. You just need to be as specific as you can be. — piRSquared, May 18 '16 at 08:01
and Alexander. Thanks a lot. Now I get it. Groupby IS my filter with what I can do the calculations, and return only the pertinent rows to the pandas... Wow - I didn't know this. Thanks. — Excaliburst, May 18 '16 at 08:49

Select only one value in df col rows in same df for calc results from different val, and calc df only on one ticker at a time

2 Answers2

Setup

Solution

Linked