I have some data that has come from some monte carlo simulations and basically I want to reduce my initial dataset (quite large) to a smaller one so plotting is easier. The data I want is represented in strings and there can be multiple entries for each simulation, but there is a chance that every string I want may not have been output during that specific simulation. In this case I would like for that to be represented in a column as a '0'.
Basically I would like for the following to happen:
Run_Number Event
1 1 A
2 1 A
3 1 B
4 1 C
5 2 A
6 2 A
7 2 B
8 3 A
Becomes:
Run_Number A_cnt B_cnt C_cnt
1 1 2 1 1
2 2 2 1 0
3 3 1 0 0
I've tried grouping the data by the event type and grabbing the size of each event, but if a certain event never happens then I won't get a value for that event and I will end up with incompatible column lengths.
Here is what I am doing, but it is pretty slow
dataframe_collection = {}
#error is one of 3 cases I'm running
for error in Blue['Error_Profile']:
error_df = Blue.loc[Blue['Error_Profile'] == error]
Blue_foo = error_df.groupby(['Run_Number', 'Event_Type'], as_index=False).size()
Blue_bar = error_df.groupby(['Run_Number', 'Result'], as_index=False).size()
nRuns_blue = Blue['Run_Number'].unique()
nEventA_blue = []
nEventB_blue = []
nEventC_blue = []
nEventD_blue = []
for run in nRuns_blue:
eventA= Blue_foo[(Blue_foo['Run_Number'] == run) & (Blue_foo['Event_Type'] == 'A')]
eventB= Blue_foo [(Blue_foo ['Run_Number'] == run) & (Blue_foo ['Event_Type'] == 'B')]
eventC= Blue_foo [(Blue_foo ['Run_Number'] == run) & (Blue_foo ['Event_Type'] == 'C')]
eventD= Blue_bar [(Blue_bar ['Run_Number'] == run) & (Blue_bar ['Result'] == 'D')]
nEventA_blue append(0) if eventA.empty else nEventA_blue.append(eventA['size'].item())
nEventB_blue.append(0) if eventB.empty else nEventB_blue .append(eventB['size'].item())
nEventC_blue.append(0) if eventC.empty else nEventC_blue.append(eventC['size'].item())
nEventD_blue.append(0) if eventD.empty else nEventD_blue.append(eventD['size'].item())
data = {'A_cnt': nEventA_blue , 'B_cnt': nEventB_blue , 'C_cnt': nEventC_blue , 'D_cnt': nEventD_blue }
dataframe_collection[error] = pd.DataFrame(data, index=nRuns_blue)