I have a dataframe with LDA topic distribution outputs along with other demographic information as below:
single_df = pd.DataFrame([{"department": 'marketing', 'LDA_1': 0.252, 'LDA_2':0.002, 'LDA_3':0.50},
{"department": 'engineering', 'LDA_1': 0.478, 'LDA_2':0.152, 'LDA_3':0.492},
{"department": 'cooperate', 'LDA_1': 0.52, 'LDA_2':0.780, 'LDA_3':0.50},
{"department": "marketing", 'LDA_1': 0.352, 'LDA_2':0.052, 'LDA_3':0.20}])
I would like to get to the below final dataframe. How do I write a function to calculate Jenson-Shannon distance between two rows (column name containing "LDA_") that returns below data frame?
i j same_department distance_LDA
0 1 0 0.23
0 2 0 0.43
0 3 1 0.26
1 2 0 0.24
1 3 0 0.11
2 3 0 0.29
I've written code to calculate JS distance between individual pairs as below. How do I turn it into a function?
array=single_df.filter(regex='LDA').to_numpy()
distance.jensenshannon(array[0],array[1])
Then to calculate whether two people share the department, I have the code below:
def same_department(i,j):
if i['department'] == j['department']:
return 1
else:
return 0