In your original code the line
dic[country]= dic[country]+1
should raise a KeyError
, because the key is not yet present in the dictionary, when a country is met for the first time. Instead you should check if the key is present, and if not, initialize the value to 1.
On the other hand, it will not, because the check
if country in country_codes['English short name lower case']:
yields False
for all values: a Series
object's __contains__
works with indices instead of values. You should for example check
if country in country_codes['English short name lower case'].values:
if your list of values is short.
For general counting tasks Python provides collections.Counter, which acts a bit like a defaultdict(int)
, but with added benefits. It removes the need for manual checking of keys etc.
As you already have DataFrame
objects, you could use the tools pandas provides:
In [12]: country_codes = pd.read_csv('wikipedia-iso-country-codes.csv')
In [13]: text = pd.DataFrame({'SomeText': """Finland , Finland , Finland
...: The country where I want to be
...: Pony trekking or camping or just watch T.V.
...: Finland , Finland , Finland
...: It's the country for me
...:
...: You're so near to Russia
...: so far away from Japan
...: Quite a long way from Cairo
...: lots of miles from Vietnam
...:
...: Finland , Finland , Finland
...: The country where I want to be
...: Eating breakfast or dinner
...: or snack lunch in the hall
...: Finland , Finland , Finland
...: Finland has it all
...:
...: Read more: Monty Python - Finland Lyrics | MetroLyrics
...: """.split()})
In [14]: text[text['SomeText'].isin(
...: country_codes['English short name lower case']
...: )]['SomeText'].value_counts().to_dict()
...:
Out[14]: {'Finland': 14, 'Japan': 1}
This finds the rows of text
where the SomeText column's value is in the English short name lower case column of country_codes
, counts unique values of SomeText, and converts to dictionary. The same with descriptive intermediate variables:
In [49]: where_sometext_isin_country_codes = text['SomeText'].isin(
...: country_codes['English short name lower case'])
In [50]: filtered_text = text[where_sometext_isin_country_codes]
In [51]: value_counts = filtered_text['SomeText'].value_counts()
In [52]: value_counts.to_dict()
Out[52]: {'Finland': 14, 'Japan': 1}
The same with Counter
:
In [23]: from collections import Counter
In [24]: dic = Counter()
...: ccs = set(country_codes['English short name lower case'])
...: for country in text['SomeText']:
...: if country in ccs:
...: dic[country] += 1
...:
In [25]: dic
Out[25]: Counter({'Finland': 14, 'Japan': 1})
or simply:
In [30]: ccs = set(country_codes['English short name lower case'])
In [31]: Counter(country for country in text['SomeText'] if country in ccs)
Out[31]: Counter({'Finland': 14, 'Japan': 1})