0

I have a dataframe that looks like this

df(contains about 200 rows)

Name,Country,Population
Chicago,USA,2694240
Luanda,Angola,8329798
Seattle,USA,783137
Meishan,China,723267
Colombo,Sri Lanka,612535
Taishan,China,524937
Faisalabad,Pakistan,3462295
Houston,USA,2340890
Rajshahi,Bangladesh,907732
Shaoyang,China,713559
Zamboanga City,Philippines,917477
Meerut,India,1696440
Mangalore,India,713357
Beira,Mozambique,569911
Samsun,Turkey,642592
Anqiu,China,682299
Jerusalem,Israel,931756
Zahedan,Iran,609873
Algiers,Algeria,2767661
Johannesburg,South Africa,5782747
Celaya,Mexico,680971
Kitwe,Zambia,685908
Da Nang,Vietnam,1125316

I want to get rows grouped by country into another dataframe called sampled_df. For random selection, I am using the code below

if condition = 'True':
    rows = np.random.choice(df.index.values,len(df))
    sampled_df = df.loc[rows].groupby('Country').head(5)
    print (sampled_df)

Output

Name            Country     Population
Chicago         USA         2694240
Seattle         USA         783137
Houston         USA         2340890
Louisville      USA         624890
Indianapolis    USA         875929

Using above I get 5 city names(max) or 0(worst case: 'Correct me if I am wrong') per Country. What I want is the selection should be maximum 5 elements and atleast 3 elements. How can I add this filter?

RoshanShah22
  • 400
  • 1
  • 3
  • 16

1 Answers1

1

You can use .between() with the DataFrame:

df['ID'].between(2, 4, inclusive=False)

This might be useful: How to select a range of values in a pandas dataframe column?

fcdt
  • 2,371
  • 5
  • 14
  • 26
  • I think you might have misunderstood me. See, in my code, head(5) is used to limit the number of returned rows to 5. In some cases it is possible that head(5) only returns two or one or zero rows, depending on df. Now I only want cases that return more than 3 and maximum of 5 rows. Hope it is more clear now. – RoshanShah22 Nov 20 '20 at 09:19