This is my pandas DataFrame
with original column names.
old_dt_cm1_tt old_dm_cm1 old_rr_cm2_epf old_gt
1 3 0 0
2 1 1 5
- Firstly I want to extract all unique variations of
cm
, e.g. in this casecm1
andcm2
. - After this I want to create a new column per each unique
cm
. In this example there should be 2 new columns. - Finally in each new column I should store the total count of non-zero original column values, i.e.
old_dt_cm1_tt old_dm_cm1 old_rr_cm2_epf old_gt cm1 cm2 1 3 0 0 2 0 2 1 1 5 2 1
I implemented the first step as follows:
cols = pd.DataFrame(list(df.columns))
ind = [c for c in df.columns if 'cm' in c]
df.ix[:, ind].columns
How to proceed with steps 2 and 3, so that the solution is automatic (I don't want to manually define column names cm1
and cm2
, because in original data set I might have many cm
variations.