Let me explain. My df
look like this:
id ` text c1
1 Hello world how are you people 1
2 Hello people I am fine people 1
3 Good Morning people -1
4 Good Evening -1
c1
contains only two values 1 or -1
Now I want a dataframe (output) like this:
Word Totalcount Points PercentageOfPointAndTotalCount
hello 2 2 100
world 1 1 100
how 1 1 100
are 1 1 100
you 1 1 100
people 3 1 33.33
I 1 1 100
am 1 1 100
fine 1 1 100
Good 2 -2 -100
Morning 1 -1 -100
Evening 1 -1 -100
Here, Totalcount
is the total times each word appears in text
column.
points
is the sum of c1
of each word. Example: people
word is in two rows where c1
is 1, and one row where c1
is -1
. So it's point is just 1 (2-1 = 1).
PercentageOfPointAndTotalCount = Points/TotalCount*100
print(df)
id comment_text target
0 59848 Hello world -1.0
1 59849 Hello world -1.0