1

Im trying to calculate a cumulative distribution into a dictionary. The distribution should take letters from a given text and find the probability over the times they appear in the text, and from this it should calculate the cumulative distribution. I don't know if I'm doing it the right way, but here's my code:

with open('text') as infile:
text = infile.read()

letters = list(text)
letter_freqs = Counter(letters(text))
letter_sum = len(letters) 
letter_proba = [letter_freqs[letter]/letter_sum for letter in letters(text)]

And now I wan't to calculate a cumulative distribution, and plot it like a histogram, can someone help me?

py.codan
  • 89
  • 1
  • 11
  • Check out [Scipy](http://en.wikipedia.org/wiki/SciPy). [Here](http://docs.scipy.org/doc/scipy-0.14.0/reference/index.html)'s a link to the API Reference. – Noob Saibot Jan 15 '15 at 22:19
  • @NoobSaibot What's this? – py.codan Jan 15 '15 at 22:21
  • The uses of `letters(text)` are broken (`letters` is a `list`, **not** callable, yet you're trying to call it). And, over **what** sequence do you want to cumulate? `letters` itself? `sorted(set(letters))`? `itertools.accumulate` can do the accumulation, of course -- but as a sequence, and "plotting a dictionary" seems weird anyway since a dictionary has no order... – Alex Martelli Jan 15 '15 at 22:27
  • What would be best to do in my situation @AlexMartelli. Can u give me an example of a code? – py.codan Jan 15 '15 at 22:31
  • @py.codan, sure, see my answer. If you edit your Q to specify the problem with precision, the answer can change accordingly. As for plotting, see e.g http://stackoverflow.com/questions/12303501/python-plot-simple-histogram-given-binned-data -- but it doesn't "plot a dictionary" (?!), it of course plots a histogram presented as a **sequence** (a dictionary has no order, so how would you plot it?!) – Alex Martelli Jan 15 '15 at 22:36

1 Answers1

1

The following should at least run (which your code as posted won't):

import collections, itertools

with open('text') as infile:
    letters = list(infile.read())  # not just letters: whitespace & punct, too
    letter_freqs = collections.Counter(letters)
    letter_sum = len(letters)
    letters_set = sorted(set(letters))
    d = {l: letter_freqs[letter]/letter_sum for l in letters_set}
    cum = itertools.accumulate(d[l] for l in letters_set)
    cum_d = dict(zip(letters_set, cum)

Now you have in cum_d a dictionary mapping each character, not just letters of course since you're done nothing to exclude whitespace and punctuation, to the cumulative probability of that character and all those below it in alphabetical order. How you plan to "plot" a dictionary, no idea. But hey, at least this does run, and produce something that might fit at least one interpretation of the vague specs you give for the task!-)

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • Thanks @Alex im gonna try this. I don't know how I planned to plot a dictionary... Im new at python, so I make some failures. – py.codan Jan 15 '15 at 22:39
  • @py.codan You should take a look at matplotlib. It is a python library for plotting. It can generate histograms. – Félix Cantournet Jan 15 '15 at 22:43