Thanks for posting the code. In the future, please do not upload images of code/errors when asking a question. and try to make it a Minimal, Reproducible Example. I don't have a Watson API key, so I couldn't reproduce your example completely, but what is does is basically the following:
In extractEntities(url)
you make an API call to Watson NLP service and for each entity found in the response you create a dictionary with the relevance, sentiment and so on. In the end you return a list of all those dictionaries. Let's make a dummy function to simulate this, based on the code you provided, so that I can try to reproduce the problem you are having.
import random
import pandas as pd
def extractEntities(url):
article_dict = [] # actually a list, not a dict!!
for entity in ('Senate', 'CNN', 'Hillary Clinton', 'Bill Clinton'):
initial_dict = {}
initial_dict['entity'] = entity
initial_dict['url'] = url
initial_dict['source'] = url.split('.')[1]
initial_dict['relevance'] = random.random()
initial_dict['sentiment'] = random.random()
article_dict.append(initial_dict)
return article_dict # returns a list of dictionaries
Sample output is a list of dictionaries:
>>> extractEntities('https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html')
[{'entity': 'Senate',
'relevance': 0.4000160139190754,
'sentiment': 0.012884391182820587,
'source': 'cnn',
'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
{'entity': 'CNN',
'relevance': 0.44921272670354884,
'sentiment': 0.40996636370319894,
'source': 'cnn',
'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
{'entity': 'Hillary Clinton',
'relevance': 0.4892046288027784,
'sentiment': 0.5424038672663258,
'source': 'cnn',
'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
{'entity': 'Bill Clinton',
'relevance': 0.7237361288162582,
'sentiment': 0.8269245953553733,
'source': 'cnn',
'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'}]
Now you have a list of URLs in allurls3
and do the following:
- You create an empty list called very confusingly
dict1
- You loop over the URLs in
allurls3
- Call
extractEntities
on that URL, data3
now holds a list of dictionaries (see above)
- Append that list of dictionaries to the list
dict1
.
The end result dict1
is a list of lists of dictionaries:
>>> allurls3 = ['https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html', 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305']
>>>> dict1 = []
>>>> for u in range(len(allurls3)):
>>> data3 = []
>>> url3 = allurls3[u]
>>> data3 = extractEntities(url3)
>>> dict1.append(data3)
>>> dict1
[[{'entity': 'Senate',
'relevance': 0.19115763152061027,
'sentiment': 0.557935869111337,
'source': 'cnn',
'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
{'entity': 'CNN',
'relevance': 0.9259134250004917,
'sentiment': 0.8605677705216526,
'source': 'cnn',
'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
{'entity': 'Hillary Clinton',
'relevance': 0.6071084891165042,
'sentiment': 0.04296592154310419,
'source': 'cnn',
'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
{'entity': 'Bill Clinton',
'relevance': 0.9558183603396242,
'sentiment': 0.42813857092335783,
'source': 'cnn',
'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'}],
[{'entity': 'Senate',
'relevance': 0.5060582500660554,
'sentiment': 0.9240451580369043,
'source': 'wsj',
'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'},
{'entity': 'CNN',
'relevance': 0.03956002547473547,
'sentiment': 0.5337343576461046,
'source': 'wsj',
'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'},
{'entity': 'Hillary Clinton',
'relevance': 0.6706912125534789,
'sentiment': 0.7721987482202004,
'source': 'wsj',
'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'},
{'entity': 'Bill Clinton',
'relevance': 0.37377943134631464,
'sentiment': 0.7114485187747178,
'source': 'wsj',
'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'}]]
And finally you wrap this list of lists of dictionaries dict1
in another list to turn it into a pandas DataFrame.
>>> pd.set_option('max_colwidth', 800)
>>> articles_df1 = pd.DataFrame([dict1])
>>> articles_df1

OK, so now I have been able to reproduce your error, I can tell you how to fix it. You know from the first image you posted that you need to provide pd.DataFrame
with a list of dictionaries, not with a list of a list of lists of dictionaries as you are doing now.
Also, naming a list dict1
is very confusingly. So instead do the following. The key difference is to use extend
instead of append
.
>>> entities = []
>>> for url3 in allurls3:
>>> data3 = extractEntities(url3)
>>> entities.extend(data3)
>>> entities
[{'entity': 'Senate',
'relevance': 0.11594421982738612,
'sentiment': 0.2917557430217993,
'source': 'cnn',
'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
{'entity': 'CNN',
'relevance': 0.5741596155387597,
'sentiment': 0.7743716765722405,
'source': 'cnn',
'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
{'entity': 'Hillary Clinton',
'relevance': 0.2535272395046557,
'sentiment': 0.2570270764910251,
'source': 'cnn',
'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
{'entity': 'Bill Clinton',
'relevance': 0.2275111369786037,
'sentiment': 0.03312536097047081,
'source': 'cnn',
'url': 'https://us.cnn.com/2020/09/15/politics/donald-trump-biden-retweet/index.html'},
{'entity': 'Senate',
'relevance': 0.8197309413723833,
'sentiment': 0.9492436947284604,
'source': 'wsj',
'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'},
{'entity': 'CNN',
'relevance': 0.7317312596198684,
'sentiment': 0.5052344447199512,
'source': 'wsj',
'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'},
{'entity': 'Hillary Clinton',
'relevance': 0.3572239446181651,
'sentiment': 0.056131606725058014,
'source': 'wsj',
'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'},
{'entity': 'Bill Clinton',
'relevance': 0.761777835912902,
'sentiment': 0.28138007550393573,
'source': 'wsj',
'url': 'https://www.wsj.com/articles/hurricane-sally-barrels-into-alabama-11600252305'}]
Now you have a list of dictionaries that you can use to create a DataFrame:
>>> pd.set_option('max_colwidth', 800)
>>> articles_df1 = pd.DataFrame(entities)
>>> articles_df1
