0

My problem is that I'm trying to create a bar plot, but it is not outputting correctly.

I have a list of dictionaries.

Each dictionary contains all of the data and attributes associated with thousands of tweets from Twitter. Each dictionary contains attributes as key:value combinations including the tweet content, the screen name of the person tweeting, the language of the tweet, the country of origin of the tweet, and more.

To create my bar plot for the language attribute, I have a list comprehension that attempts to read in the list as a Pandas dataframe and output the data as a bar plot with 5 frequency bars for each of the top 5 most used languages in my list of tweets.

Here is my code for the language bar plot (note that my list of dictionaries containing each tweet is called tweets_data):

tweets_df = pd.DataFrame()

tweets_df['lang'] = map(lambda tweet: tweet['lang'], tweets_data)

tweets_by_lang = tweets_df['lang'].value_counts()

fig, ax = plt.subplots()
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=10)
ax.set_xlabel('Languages', fontsize=15)
ax.set_ylabel('Number of tweets' , fontsize=15)
ax.set_title('Top 5 languages', fontsize=15, fontweight='bold')
tweets_by_lang[:5].plot(ax=ax, kind='bar', color='red')

As I said, I should be getting 5 bars, one for each of the top five languages in my data. Instead, I am getting the graph show below.enter image description here

TJE
  • 570
  • 1
  • 5
  • 20
  • 2
    The problem is here: `tweets_df['lang'] = map( ... )`. What does `tweets_data` look like? What kind of object is it? If it's a dataframe, why are you mapping it instead of just using `tweets_data['lang'].value_counts()`? – ASGM Oct 19 '17 at 15:03
  • tweets_data is a list, and each item in the list is a dictionary. Each dictionary contains all of the data for a single tweet. And when I try your suggestion of tweets_data['lang'].value_counts() -- I get the error "TypeError: list indices must be integers or slices, not str." – TJE Oct 19 '17 at 15:17
  • 1
    What does the output of `print tweets_df['lang']` look like? – ASGM Oct 19 '17 at 15:19
  • 0 en 1 en 2 en 3 en 4 pt 5 sp 6 en 7 und 8 en 9 en 10 en ... 530 en 531 sp 532 en 533 en 534 it 535 en 536 pt 537 en – TJE Oct 19 '17 at 15:22
  • 1
    Hmm, it's not immediately clear to me why this isn't working. I've made some sample data and tried it myself and it seems fine. What happens if you replace `tweets_data` with this test list: `tweets_data = [{'lang': 'en'}, {'lang': 'pl'}, {'lang': 'en'}]`. – ASGM Oct 19 '17 at 15:30
  • Actually, you posted a response that worked for me, and then you deleted it! It was this: tweets_by_lang = pd.Series([tweet['lang'] for tweet in tweets_data]).value_counts() -- So if you still have that response, if you put it back I will check your reply as the best/correct response. Thanks! (Also, I would appreciate being able to read your explanation again for what I am doing wrong). – TJE Oct 19 '17 at 15:33
  • I think I'm seeing now that my problem is likely because I borrowed most of this script from a person/blog post that used Python 2, and I'm using Python 3. According to some comments on that blog post, apparently the code works in Python 2.7, but not Python 3.6. – TJE Oct 19 '17 at 15:35
  • Ah, so that's it! I've re-instated the answer and incorporated the reason there. – ASGM Oct 19 '17 at 15:39

1 Answers1

2

Your problem is here:

tweets_df['lang'] = map(lambda tweet: tweet['lang'], tweets_data)

The issue, as your comment suggests, is down to changes from Python 2 to 3. In Python 2, map() returns a list. But in Python 3, map() returns an iterator. The hint is that there's only one value of tweets_df['lang'].value_counts() and it's the <map ... > iterator object).

In either Python 2 or 3, you can use a list comprehension instead:

tweet_by_lang = pd.Series([tweet['lang'] for tweet in tweets_data]).value_counts()

Or in Python 3, you can follow @Triptych's advice from the answer linked above and wrap map() in a list():

tweets_df['lang'] = list(map(lambda tweet: tweet['lang'], tweets_data))
ASGM
  • 11,051
  • 1
  • 32
  • 53