I created a matrix, using answers from these questions - question 1 and question 2. Similar questions about this error did not help to resolve.
But probabilities exceed 1 - ValueError: probabilities do not sum to 1
Please let me know how can I share with you a piece of the df for the reproducibility.
I generated the concurrence matrix, using this code
# Create matrix
my_df = pd.DataFrame(0, columns = words, index = words)
for k,v in frequency_list.items():
my_df.at[k[0],k[1]] = v
which gives me the matrix 10000*10000.
Then I converted into frequencies
row_sums = my_df.values.sum(axis = 1)
row_sums[row_sums == 0] = 1
my_prob = my_df/row_sums.reshape((-1,1))
my_prob
When I print one word
my_prob.sum().tail(30)
I have a probability above 1.
“thy 0.000000
“till 0.002538
**“to 1.109681**
Tried to normalize
Pick the word the and generate a list
word_the = my_string_prob['the'].tolist()
Try to normalize probabilities
sum_of_elements = sum(word_the)
a = 1/sum_of_elements
my_probs_scaled = [e*a for e in word_the]
my_probs_scaled
sum(my_probs_scaled)
### Output 1.000000000000005
This code worked on a smaller matrix, which was not so big and complex in one of questions above. Thanks!