Pycharm problem set (Stuck from step 3 onwards)

Question

Using the ff_monthly.csv data set https://github.com/alexpetralia/fama_french, use the first column as an index (this contains the year and month of the data as a string
Create a new column ‘Mkt’ as ‘Mkt-RF’ + ‘RF’
Create two new columns in the loaded DataFrame, ‘Month’ and ‘Year’ to contain the year and month of the dataset extracted from the index column.
Create a new DataFrame with columns ‘Mean’ and ‘Standard Deviation’ and the full set of years from (b) above.
Write a function which accepts (r_m,s_m) the monthy mean and standard deviation of a return series and returns a tuple (r_a,s_a), the annualised mean and standard deviation. Use the formulae: r_a = (1+r_m)^12 -1, and s_a = s_m * 12^0.5.
Loop through each year in the data, and calculate the annualised mean and standard deviation of the new ‘Mkt’ column, storing each in the newly created DataFrame. Note that the values in the input file are % returns, and need to be divided by 100 to return decimals (i.e the value for August 2022 represents a return of -3.78%). . Print the DataFrame and output it to a csv file.

Workings so far:

import pandas as pd

ff_monthly=pd.read_csv(r"file path")

ff_monthly=pd.read_csv(r"file path",index_col=0)

Mkt=ff_monthly['Mkt-RF']+ff_monthly['RF']

ff_monthly= ff_monthly.assign(Mkt=Mkt)

df=pd.DataFrame(ff_monthly)

Hi and welcome to stackoverflow. pandas has excellent Getting Started guides. Especially [How to derivce new columns from existing ones](https://pandas.pydata.org/docs/getting_started/intro_tutorials/05_add_columns.html) looks like exactly what you need I can highly recommend diving into them if you want to learn about pandas :) — maow, Oct 23 '22 at 12:14
Thank you for the resources. However, what I am confused about is how to split an index column into two columns (i.e. 'Date' column which contains year and month as a string and split them into 'Month' and 'Year' columns. — XYZ, Oct 23 '22 at 12:31
You technically do not need to split the index. The easiest way would be to create a column `Month` by extracting the month. And then you create a second column `Year` by extracting the year. — maow, Oct 23 '22 at 12:34
I try df[['Month','Year']] = df.Date.str.split("_", expand=True)but it said AttributeError: 'DataFrame' object has no attribute 'Date' — XYZ, Oct 23 '22 at 12:41
I did the following: from datetime import datetime data_object= datetime.strptime(str(df.index), "%Y%m") but value error occur: does not match format '%Y%m' — XYZ, Oct 23 '22 at 13:40

score 1 · Answer 1 · answered Oct 23 '22 at 12:54

There are a few things to pay attention to.

The Date is the index of your DataFrame. This is treated in a special way compared to the normal columns. This is the reason df.Date gives an Attribute error. Date is not an Attribute, but the index. Instead try df.index
df.Date.str.split("_", expand=True) would work if your Date would look like 22_10. However according to your picture it doesn't contain an underscore and also contains the day, so this cannot work
In fact the format you have is not even following any standard. In order to properly deal with that the best way would be parsing this to a proper datetime64[ns] type that pandas will understand with df.index = pd.to_datetime(df.index, format='%y%m%d'). See the python docu for supported format strings.

If all this works, it should be rather straightforward to create the columns

df['year'] = df.index.dt.year

In fact, this part has been asked before

1 Answers1