I have this data frame:
import pandas as pd
df = pd.read_csv('olympics.csv', index_col=0, skiprows=1)
for col in df.columns:
if col[:2]=='01':
df.rename(columns={col:'Gold'+col[4:]}, inplace=True)
if col[:2]=='02':
df.rename(columns={col:'Silver'+col[4:]}, inplace=True)
if col[:2]=='03':
df.rename(columns={col:'Bronze'+col[4:]}, inplace=True)
if col[:1]=='№':
df.rename(columns={col:'#'+col[1:]}, inplace=True)
names_ids = df.index.str.split('\s\(') # split the index by '('
df.index = names_ids.str[0] # the [0] element is the country name (new index)
df['ID'] = names_ids.str[1].str[:3] # the [1] element is the abbreviation or ID (take first 3 characters from that)
df = df.drop('Totals')
df.head()
The olympics.csv
file is actually this list.
Here is a picture of the DataFrame (The first five countries):
Here is the code of the first ten countries:
{'# Summer': {'Afghanistan': 13,
'Algeria': 12,
'Argentina': 23,
'Armenia': 5,
'Australasia': 2,
'Australia': 25,
'Austria': 26,
'Azerbaijan': 5,
'Bahamas': 15,
'Bahrain': 8},
'Gold': {'Afghanistan': 0,
'Algeria': 5,
'Argentina': 18,
'Armenia': 1,
'Australasia': 3,
'Australia': 139,
'Austria': 18,
'Azerbaijan': 6,
'Bahamas': 5,
'Bahrain': 0},
'Silver': {'Afghanistan': 0,
'Algeria': 2,
'Argentina': 24,
'Armenia': 2,
'Australasia': 4,
'Australia': 152,
'Austria': 33,
'Azerbaijan': 5,
'Bahamas': 2,
'Bahrain': 0},
'Bronze': {'Afghanistan': 2,
'Algeria': 8,
'Argentina': 28,
'Armenia': 9,
'Australasia': 5,
'Australia': 177,
'Austria': 35,
'Azerbaijan': 15,
'Bahamas': 5,
'Bahrain': 1},
'Total': {'Afghanistan': 2,
'Algeria': 15,
'Argentina': 70,
'Armenia': 12,
'Australasia': 12,
'Australia': 468,
'Austria': 86,
'Azerbaijan': 26,
'Bahamas': 12,
'Bahrain': 1},
'# Winter': {'Afghanistan': 0,
'Algeria': 3,
'Argentina': 18,
'Armenia': 6,
'Australasia': 0,
'Australia': 18,
'Austria': 22,
'Azerbaijan': 5,
'Bahamas': 0,
'Bahrain': 0},
'Gold.1': {'Afghanistan': 0,
'Algeria': 0,
'Argentina': 0,
'Armenia': 0,
'Australasia': 0,
'Australia': 5,
'Austria': 59,
'Azerbaijan': 0,
'Bahamas': 0,
'Bahrain': 0},
'Silver.1': {'Afghanistan': 0,
'Algeria': 0,
'Argentina': 0,
'Armenia': 0,
'Australasia': 0,
'Australia': 3,
'Austria': 78,
'Azerbaijan': 0,
'Bahamas': 0,
'Bahrain': 0},
'Bronze.1': {'Afghanistan': 0,
'Algeria': 0,
'Argentina': 0,
'Armenia': 0,
'Australasia': 0,
'Australia': 4,
'Austria': 81,
'Azerbaijan': 0,
'Bahamas': 0,
'Bahrain': 0},
'Total.1': {'Afghanistan': 0,
'Algeria': 0,
'Argentina': 0,
'Armenia': 0,
'Australasia': 0,
'Australia': 12,
'Austria': 218,
'Azerbaijan': 0,
'Bahamas': 0,
'Bahrain': 0},
'# Games': {'Afghanistan': 13,
'Algeria': 15,
'Argentina': 41,
'Armenia': 11,
'Australasia': 2,
'Australia': 43,
'Austria': 48,
'Azerbaijan': 10,
'Bahamas': 15,
'Bahrain': 8},
'Gold.2': {'Afghanistan': 0,
'Algeria': 5,
'Argentina': 18,
'Armenia': 1,
'Australasia': 3,
'Australia': 144,
'Austria': 77,
'Azerbaijan': 6,
'Bahamas': 5,
'Bahrain': 0},
'Silver.2': {'Afghanistan': 0,
'Algeria': 2,
'Argentina': 24,
'Armenia': 2,
'Australasia': 4,
'Australia': 155,
'Austria': 111,
'Azerbaijan': 5,
'Bahamas': 2,
'Bahrain': 0},
'Bronze.2': {'Afghanistan': 2,
'Algeria': 8,
'Argentina': 28,
'Armenia': 9,
'Australasia': 5,
'Australia': 181,
'Austria': 116,
'Azerbaijan': 15,
'Bahamas': 5,
'Bahrain': 1},
'Combined total': {'Afghanistan': 2,
'Algeria': 15,
'Argentina': 70,
'Armenia': 12,
'Australasia': 12,
'Australia': 480,
'Austria': 304,
'Azerbaijan': 26,
'Bahamas': 12,
'Bahrain': 1},
'ID': {'Afghanistan': 'AFG',
'Algeria': 'ALG',
'Argentina': 'ARG',
'Armenia': 'ARM',
'Australasia': 'ANZ',
'Australia': 'AUS',
'Austria': 'AUT',
'Azerbaijan': 'AZE',
'Bahamas': 'BAH',
'Bahrain': 'BRN'}}
Gold
means: Number of summer gold medals.Gold.1
means: Number of winter gold medals.Gold.2
means: Number of ALL the gold medals.
The question is:
Which country has the biggest difference between their summer gold medal counts and winter gold medal counts relative to their total gold medal count?
To make it clearer:
(Which means: finding the country which had the maximum number of summer gold medals AND minimum number of winter gold medals AND minimum number of total gold medals).
Only include countries that have won at least 1 gold in both summer and winter.
This function should return a single string value.
My solution:
I tried to solve this problem using two steps:
First:
Filtering only the countries which have gold medals (both in summer and winter).
This is the closest code I've reached:
df[(df['Gold'] > 0) & (df['Gold.1'] > 0)]
But it returns a DataFrame, not a list.
Trying this code:
(df['Gold'] > 0) & (df['Gold.1'] > 0)
I've got a list of ALL the countries, and beside them True
or False
.
So it actually doesn't filter anything.
and also:
print(df.loc[df['Gold'] == True])
which seemed to give a wrong answer.
This is what I've got:
(Note that this list doesn't contain United States, for example).
Second:
Finding the highest value of the fraction:
Well, the codes:
(df['Gold'] - df['Gold.1']).idxmax()
and
df['Gold.2'].idxmin()
are interpreted just fine,
BUT:
(df['Gold'] - df['Gold.1']).idxmax() & df['Gold.2'].idxmin()
Gives me an error:
TypeError: unsupported operand type(s) for &: 'str' and 'str'
I've tried also this code:
df.loc[df['Gold'] > 0 & (df['Gold'] - df['Gold.1']).idxmax() & df['Gold.2'].idxmin()]
which gave me the same error:
TypeError: unsupported operand type(s) for &: 'int' and 'str'
Another problem which I thought about, but didn't get to this step because of the errors, is:
How is the fraction should be presented: a float number? something else?
I looked also here, here, here, here and here.
I found the following functions relevant:
where()
, mask()
, query()
, isin()
, all()
etc.
But it seems to be a tough question, which non of the above links truly helped me to solve it.
Thanks a lot.