I have dataset quantized it to 10 levels by Python and looks like:
9 9 1 8 9 1
1 9 3 6 1 0
8 3 8 4 4 1
0 2 1 9 9 0
This means the component (9 9 1 8 9) belongs to class 1. I want to find the Entropy of each feature(column). I wrote the following code but it has many errors:
import pandas as pd
import math
f = open ( 'data1.txt' , 'r')
# Finding the probability
df = pd.DataFrame(pd.read_csv(f, sep='\t', header=None, names=['val1',
'val2', 'val3', 'val4','val5', 'val6', 'val7', 'val8']))
df.loc[:,"val1":"val5"] = df.loc[:,"val1":"val5"].div(df.sum(axis=0),
axis=1)
# Calculating Entropy
def shannon(col):
entropy = - sum([ p * math.log(p) / math.log(2.0) for p in col])
return entropy
sh_df = df.loc[:,'val1':'val5'].apply(shannon,axis=0)
Can you correct my code or do you know any function for finding the Entropy of each column of a dataset in Python?