I have a dataframe that looks like this
df
(contains about 200 rows)
Name,Country,Population
Chicago,USA,2694240
Luanda,Angola,8329798
Seattle,USA,783137
Meishan,China,723267
Colombo,Sri Lanka,612535
Taishan,China,524937
Faisalabad,Pakistan,3462295
Houston,USA,2340890
Rajshahi,Bangladesh,907732
Shaoyang,China,713559
Zamboanga City,Philippines,917477
Meerut,India,1696440
Mangalore,India,713357
Beira,Mozambique,569911
Samsun,Turkey,642592
Anqiu,China,682299
Jerusalem,Israel,931756
Zahedan,Iran,609873
Algiers,Algeria,2767661
Johannesburg,South Africa,5782747
Celaya,Mexico,680971
Kitwe,Zambia,685908
Da Nang,Vietnam,1125316
I want to get rows grouped by country into another dataframe called sampled_df
. For random selection, I am using the code below
if condition = 'True':
rows = np.random.choice(df.index.values,len(df))
sampled_df = df.loc[rows].groupby('Country').head(5)
print (sampled_df)
Output
Name Country Population
Chicago USA 2694240
Seattle USA 783137
Houston USA 2340890
Louisville USA 624890
Indianapolis USA 875929
Using above I get 5 city names(max) or 0(worst case: 'Correct me if I am wrong') per Country. What I want is the selection should be maximum 5 elements and atleast 3 elements. How can I add this filter?