pivot_table
allows an aggregate function, so if you only add on column containing 1
to you dataframe, you could get the expected result:
newdf = a.assign(val=1).pivot_table(values='val', index='Actual_Data',
columns='Final_Unique', aggfunc=sum, fill_value=0
).reset_index().rename_axis(None, axis=1)
With the sample data it gives:
Actual_Data 12,12 12,3 2,12 3,4 4,12 6,7 7,6 7,7 8,8
0 1,1,1,1,1,14, 1 0 0 0 0 0 0 0 0
1 1,1,12,2,2,4, 0 0 1 0 0 0 0 0 0
2 1,14,14,1,1,2, 0 0 0 0 1 0 0 0 0
3 14,14,14,14,14,14, 0 0 0 0 0 0 1 0 0
4 2,12,3,4,12,12, 0 0 0 0 0 0 0 1 0
5 6,6,6,6,3,14, 0 0 0 1 0 0 0 0 0
6 6,7,7,7,6,7, 0 0 0 0 0 1 0 0 0
7 6,8,8,8,8,12, 0 1 0 0 0 0 0 0 0
8 8,8,8,8,8,8, 0 0 0 0 0 0 0 0 1
From you comments, the two columns of the original a
dataframe are not directly related but are in fact two independant Series, and you just want to know how many times a Final_Unique
value occurs in a Actual_Data
one.
This code should give it:
newdf = a[['Actual_Data']]
for col in set(a['Final_Unique'].dropna()):
newdf[col] = newdf.Actual_Data.str.findall('(^|,)'+col+'(?=,)').apply(len)
The result is:
Actual_Data 8,8 12,3 12,12 7,7 2,12 6,7 4,12 3,4 7,6
0 8,8,8,8,8,8, 3 0 0 0 0 0 0 0 0
1 6,7,7,7,6,7, 0 0 0 1 0 2 0 0 1
2 2,12,3,4,12,12, 0 1 1 0 2 0 1 1 0
3 14,14,14,14,14,14, 0 0 0 0 0 0 0 0 0
4 1,1,12,2,2,4, 0 0 0 0 0 0 0 0 0
5 6,8,8,8,8,12, 2 0 0 0 0 0 0 0 0
6 6,6,6,6,3,14, 0 0 0 0 0 0 0 0 0
7 1,14,14,1,1,2, 0 0 0 0 0 0 0 0 0
8 1,1,1,1,1,14, 0 0 0 0 0 0 0 0 0
The rationale behind the above code is:
- compute the list of columns from the
Final_Unique
columns: the set
will automatically remove duplicates (but will break order) and dropna()
will remove empty values
- create a new dataframe from the
Actual_Data
column.
- add columns to that dataframe from the list computed at previous step and for each column
- use
str.findall
and an appropriate regex to find non overlapping sequence containing the column name after a comma or at beginning of string and followed by a comma - that last comma not being included in the matched string to allow consecutive sequences to be correctly counted, hence the (?=...)
- apply the len function to tranform that list in a simple count