0

I have been trying to give you a reproducible example, but the Period doesn't allow me read the dictionary as dataframe. So I just transformed my dataframe to dictionary and copied here (If you can tell me how to make it reproducible I will edit this). The dataframe is a series (one column) and two indexes:

{(Period('2020-01', 'M'), False): 213,
 (Period('2020-01', 'M'), True): 21,
 (Period('2020-02', 'M'), False): 313,
 (Period('2020-02', 'M'), True): 13,
 (Period('2020-03', 'M'), False): 213,
 (Period('2020-03', 'M'), True): 23,
 (Period('2020-04', 'M'), False): 213,
 (Period('2020-04', 'M'), True): 12,
 (Period('2020-05', 'M'), False): 321,
 (Period('2020-05', 'M'), True): 121,
 (Period('2020-06', 'M'), False): 321,
 (Period('2020-06', 'M'), True): 22,
 (Period('2020-07', 'M'), False): 333,
 (Period('2020-07', 'M'), True): 11}

The thing I'm trying to do is to create a second column with the percenteges grouped per month (index level=0). I have reached this so far:

df["new_column"] = df.groupby(level=0).apply(lambda x: x/sum(x))

The error:

DateParseError: Unknown datetime string format, unable to parse: new_columns

It seems that groupby cannot recognize the date format when assigning the percentages to new_column. Why?

Chris
  • 2,019
  • 5
  • 22
  • 67
  • It will be better if you show the dataframe before you change the column to a period. Please [create a reproducible copy of the DataFrame with `df.head(20).to_clipboard(sep=',')`](https://stackoverflow.com/questions/52413246/how-to-provide-a-copy-of-your-dataframe-with-to-clipboard), [edit] the question, and paste the clipboard into a code block. – Trenton McKinney Jul 09 '20 at 21:37

1 Answers1

0

I think that you just need to add [0] after you specify the level in your groupby.

df['new column'] = df.groupby(level=0)[0].apply(lambda x:x/sum(x))

Just in case, I have copied all the code I used below.

dictionary = ({(pd.Period('2020-01', 'M'), False): 213,
(pd.Period('2020-01', 'M'), True): 21,
(pd.Period('2020-02', 'M'), False): 313,
(pd.Period('2020-02', 'M'), True): 13,
(pd.Period('2020-03', 'M'), False): 213,
(pd.Period('2020-03', 'M'), True): 23,
(pd.Period('2020-04', 'M'), False): 213,
(pd.Period('2020-04', 'M'), True): 12,
(pd.Period('2020-05', 'M'), False): 321,
(pd.Period('2020-05', 'M'), True): 121,
(pd.Period('2020-06', 'M'), False): 321,
(pd.Period('2020-06', 'M'), True): 22,
(pd.Period('2020-07', 'M'), False): 333,
(pd.Period('2020-07', 'M'), True): 11})

df = pd.DataFrame(pd.Series(dictionary))

df['new column'] = df.groupby(level=0)[0].apply(lambda x:x/sum(x))
rhug123
  • 7,893
  • 1
  • 9
  • 24