how to plot histogram in matplotlib when data is in tuples?

Question

I need to plot a histogram of the 5 most frequently occurring words in a list. I've used the collections module's c.counter().most_common() to give me the following tuples:

[('you', 7706), ('i', 6570), ('we', 2733), ('my', 2718), ('he', 2369)]

How can I plot a histogram when the data is in the format ('word', frequency)?

The format that I am familiar with is: ['you', 'you', 'you', ... , 'i', 'i', 'i', ... , etc.]

I know that I could multiply the string times the integer in each element to build a new list in the format I am familiar with to plot on the histogram but I feel like there has to be a more efficient way to do this.

Mr. T · Accepted Answer · 2018-11-25T07:37:50.540

5

Unzip your list of tuples:

from matplotlib import pyplot as plt

a = [('you', 7706), ('i', 6570), ('we', 2733), ('my', 2718), ('he', 2369)]

plt.bar(*zip(*a))
plt.show()

Sample output:

edited Nov 25 '18 at 07:37

answered Nov 25 '18 at 07:28

Mr. T

11,960
10
32
54

score 1 · Answer 2 · answered Nov 25 '18 at 07:17

You can use matplotlib bar chart:

import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt


items = [('you', 7706), ('i', 6570), ('we', 2733), ('my', 2718), ('he', 2369)]
y_pos = np.arange(len(items)) 
plt.bar(y_pos, [x[1] for x in items], align='center', alpha=0.5)
plt.xticks(y_pos, [x[0] for x in items])

plt.show()

With the result:

score 1 · Answer 3 · answered Nov 25 '18 at 07:29

1

I prefer pandas for easy manipulation of data and plotting:

import pandas

freqs = [('you', 7706), ('i', 6570), ('we', 2733), ('my', 2718), ('he', 2369)]

# Create a DataFrame for the data, with names for the columns
freqdf = pandas.DataFrame(freqs, columns=['Word', 'Count']).set_index('Word')
freqdf.plot.barh()

Resulting plot:

answered Nov 25 '18 at 07:29

chthonicdaemon

19,180
2
52
66

Thank you but this was for an exercise in matplotlib unfortunately – Jacob Myer Nov 25 '18 at 07:49

score 1 · Answer 4 · answered Nov 25 '18 at 07:37

Here's an extension of above solution using Matplotlib as well as Seaborn:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

lst = [('you', 7706), ('i', 6570), ('we', 2733), ('my', 2718), ('he', 2369)]

val, cnt = (zip(*lst))
val, cnt = list(val), list(cnt)
val, cnt
# (['you', 'i', 'we', 'my', 'he'], [7706, 6570, 2733, 2718, 2369])

# using Matplotlib
length = len(cnt)
plt.bar(np.arange(length), cnt, label=True)
plt.xticks(np.arange(len(cnt)), val)
plt.show()

# using seaborn 
sns.barplot( val, cnt )

Thank you, I would prefer seaborn but I had to use matplotlib for this particular exercise. — Jacob Myer, Nov 25 '18 at 07:50

how to plot histogram in matplotlib when data is in tuples?

4 Answers4