Split pandas column and create new columns that count the split values and group by months

Question

I have a dataframe that looks like this

  Error \nNumber       Date                    Type of error 
0             2122 2020-01-09  NHS Spine check - External error
1             2123 2020-01-09                  EP3- Run failure
2             2124 2020-02-09  NHS Spine check - External error
3             2125 2020-03-09  NHS Spine check - External error
4             2126 2020-04-09  NHS Spine check - External error
..             ...        ...                               ...
837           2949 2023-03-07                   DE - Data Entry
838           2950 2023-03-07      EI - Error of interpretation
839           2951 2023-03-07       EX  -External error - other
840           2952 2023-04-07      EP8- SOPs not being followed
841           2953 2023-06-07                        OT - Other

Here a reproducible data set converted in a dictionary

data.head().to_dict()

{'Error \nNumber': {0: '2122', 1: '2123', 2: '2124', 3: '2125', 4: '2126'},
 'Date': {0: Timestamp('2020-01-09 00:00:00'),
  1: Timestamp('2020-01-09 00:00:00'),
  2: Timestamp('2020-02-09 00:00:00'),
  3: Timestamp('2020-03-09 00:00:00'),
  4: Timestamp('2020-04-09 00:00:00')},
 'Type of error ': {0: 'NHS Spine check - External error',
  1: 'EP3- Run failure',
  2: 'NHS Spine check - External error',
  3: 'NHS Spine check - External error',
  4: 'NHS Spine check - External error'}}

I am trying to count the values in column "Type of error", group them by months and split data in new columns like this (following example only takes the first 5 rows

Date       NHS Spine check - External error       EP3- Run failure
2020-09    4                                      1

I have tried the answers found here but the differences prevent me to do what I want

score 1 · Accepted Answer · answered Jul 14 '23 at 12:32

df = (pd.DataFrame(data)
   .drop("Error Number", axis=1)
   .assign(Date=lambda x: pd.to_datetime(x["Date"]).dt.strftime("%Y-%d"))
   .groupby(["Date", "Type of error"])
   .size()
   .reset_index()
   .pivot(index="Date", columns="Type of error", values=0)
)

Type of error  DE - Data Entry  EI - Error of interpretation  \
Date                                                           
2020-09                    NaN                           NaN   
2023-07                    1.0                           1.0   

Type of error  EP3- Run failure  EP8- SOPs not being followed  \
Date                                                            
2020-09                     1.0                           NaN   
2023-07                     NaN                           1.0   

Type of error  EX  -External error - other  NHS Spine check - External error  \
Date                                                                           
2020-09                                NaN                               4.0   
2023-07                                1.0                               NaN   

Type of error  OT - Other  
Date                       
2020-09               NaN  
2023-07               1.0

Your data is in YDM format, which Pandas was thinking was YMD. Rather than correct it, I just used it to my advantage — Mark, Jul 14 '23 at 20:35

Split pandas column and create new columns that count the split values and group by months

1 Answers1