Pandas and stocks: From daily values (in columns) to monthly values (in rows)

Question

I am having trouble reformatting a dataframe.

My input is a day value rows by symbols columns (each symbol has different dates with it's values):

code to generate input

data = [("01-01-2010", 15, 10), ("02-01-2010", 16, 11), ("03-01-2010", 16.5, 10.5)]
labels = ["date", "AAPL", "AMZN"]
df_input = pd.DataFrame.from_records(data, columns=labels)

The needed output is (month row with new row for each month):

Needed output

code to generate output

data = [("01-01-2010","29-01-2010", "AAPL", 15, 20), ("01-01-2010","29-01-2010", "AMZN", 10, 15),("02-02-2010","30-02-2010", "AAPL", 20, 32)]
labels = ['bd start month', 'bd end month','stock', 'start_month_value', "end_month_value"]
df = pd.DataFrame.from_records(data, columns=labels)

Meaning (Pseudo code) 1. for each row take only non nan values to create a new "row" (maybe dictionary with the date as the index and the [stock, value] as the value. 2. take only rows that are business start of month or business end of month. 3. write those rows to a new datatframe.

I have read several posts like this and this and several more. All treat with dataframe of the same "type" and just resampling while I need to change to structure...

My code so far

# creating the new index with business days
df1 =pd.DataFrame(range(10000), index = pd.date_range(df.iloc[0].name, periods=10000, freq='D'))
from pandas.tseries.offsets import CustomBusinessMonthBegin
from pandas.tseries.holiday import USFederalHolidayCalendar
bmth_us = CustomBusinessMonthBegin(calendar=USFederalHolidayCalendar())
df2 = df1.resample(bmth_us).mean()

# creating the new index interseting my old one (daily) with the monthly index
new_index = df.index.intersection(df2.index)

# selecting only the rows I want
df = df.loc[new_index]

# creating a dict that will be my new dataset
new_dict = collections.OrderedDict()
# iterating over the rows and adding to dictionary
for index, row in df.iterrows():
#     print index
    date = df.loc[index].name
    # values are the not none values
    values = df.loc[index][~df.loc[index].isnull().values]

    new_dict[date]=values


# from dict to list
data=[]
for key, values in new_dict.iteritems():
    for i in range(0, len(values)):
        date = key
        stock_name = str(values.index[i])
        stock_value = values.iloc[i]
        row = (key, stock_name, stock_value)
        data.append(row)

# from the list to df
labels = ['date','stock', 'value']
df = pd.DataFrame.from_records(data, columns=labels)
df.to_excel("migdal_format.xls")

Current output I get

One big problem:

I only get value of the stock on the start of month day.. I need start and end so I can calculate the stock gain on this month..

One smaller problem:

I am sure this is not the cleanest and fastest code :)

Thanks a lot!

Can you add [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) ? — jezrael, Jul 13 '18 at 07:39
No, because is not possible copy data, please check [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — jezrael, Jul 13 '18 at 07:58
I have managed to get both start and end of month dates (I think) with this code.. but still l dont know how to take those two values and put in the same column — koren maliniak, Jul 13 '18 at 07:59
Pictures as code or data example are **never** good. How do you expect that someone can reproduce your code from a picture? Can *you* run your pictures in a console to get the code/data? ;) Btw.: I'm interested in getting stock-data. Where did you get that data from? — JE_Muc, Jul 13 '18 at 07:59
added code to generate output. working on input. sucks that people down vote so quickly. — koren maliniak, Jul 13 '18 at 08:09
Thanks! Is there any advantage of Intrinio over AlphaVantage? Your input code does not span a whole month, so it is impossible to generate start and end values of a month. Besides that it is a `RangeIndex`. You should be working with a `DatetimeIndex` to get the full pandas-functionality. — JE_Muc, Jul 13 '18 at 08:26
@korenmaliniak - Not sure how you get output `df` from `df_input`, I guess data are different. — jezrael, Jul 13 '18 at 08:41
It is not the same data.. you want me to make 35 rows of data for the daily input? it's not enough what I did to understand the framework? I can attach my real input data as a file.. but I dont have the output data.. if I had - this would not be a question — koren maliniak, Jul 13 '18 at 10:08

score 0 · Answer 1 · answered Jul 13 '18 at 12:19

So I have found a way.

looping through each column
groupby month
taking the first and last value I have in that month

calculate return

df_migdal = pd.DataFrame() for col in df_input.columns[0:]: stock_position = df_input.loc[:,col]

name = stock_position.name
name = re.sub('[^a-zA-Z]+', '', name)
name = name[0:-4]


stock_position=stock_position.groupby([pd.TimeGrouper('M')]).agg(['first', 'last'])

stock_position["name"] = name
stock_position["return"] = ((stock_position["last"] / stock_position["first"]) - 1) * 100
stock_position.dropna(inplace=True)
df_migdal=df_migdal.append(stock_position)
df_migdal=df_migdal.round(decimals=2)

I tried I way cooler way, but did not know how to handle the ,multi index I got... I needed that for each column, to take the two sub columns and create a third one from some lambda function.

df_input.groupby([pd.TimeGrouper('M')]).agg(['first', 'last'])

Pandas and stocks: From daily values (in columns) to monthly values (in rows)

1 Answers1