0

Sorry if this is a repetitive question; I found no others with similar solutions.

I have a very large pandas dataframe called csv_table:

print(csv_table.shape) yields (1155522, 6)

The dataframe looks like this:

                username                                              tweet following followers is_retweet is_bot
0             narutouz16  RT @GetMadz: Sound design in this game is 10/1...        59        20          1      0
1             narutouz16                         @hbthen3rd I know I don't.        59        20          0      0
2             narutouz16  @TonyKelly95 I'm still not satisfied in the en...        59        20          0      0

What I need to do is to create a smaller Dataframe with just the first 20 rows for each username, and skip usernames that do not have at least 20 rows.

I have looked at this question which suggests using something like the following:

df.groupby('username').head(20).reset_index(drop=True)

This produces rather good results:

              username                                              tweet following followers is_retweet is_bot
0           narutouz16  RT @GetMadz: Sound design in this game is 10/1...        59        20          1      0
1           narutouz16                         @hbthen3rd I know I don't.        59        20          0      0
2           narutouz16  @TonyKelly95 I'm still not satisfied in the en...        59        20          0      0
3           narutouz16  I'm currently in second place in my leaderboar...        59        20          0      0
4           narutouz16  @TheRealRotimi live footage of us at spin. htt...        59        20          0      0
5           narutouz16  Duolingo has more content than I thought, add ...        59        20          0      0
6           narutouz16                       @TonyKelly95 It dont go down        59        20          0      0
7           narutouz16  This is my meme day, where I explore the inter...        59        20          0      0
8           narutouz16            RT @DitzyFlama: ygHeAZSQkA        59        20          1      0
9           narutouz16  When you turn around and someone went from sin...        59        20          0      0
10          narutouz16  How dare you leave me with a cliffhanger in ch...        59        20          0      0
11          narutouz16                     I'm entering my popular phase.        59        20          0      0
12          narutouz16  RT @gurugurugravity: #ThankYouGameFreak for th...        59        20          1      0
13          narutouz16  @TonyKelly95 I'm pretty sure that guy was just...        59        20          0      0
14          narutouz16  Yeah, when christmas time comes, I'm about to ...        59        20          0      0
15          narutouz16  I don't like higher education, because the las...        59        20          0      0
16          narutouz16  I found a spotify playlist called childhood Bo...        59        20          0      0
17          narutouz16  Theres two type of people in this world. Peopl...        59        20          0      0
18          narutouz16  I just want to let people know just dance 2020...        59        20          0      0
19          narutouz16           RT @AAAAAGGHHHH: PxD2vdLelo        59        20          1      0
20       GamerGrowthHQ  RT @zFakes_: Looking for an editor to make My ...     73508    130115          1      0
21       GamerGrowthHQ  RT @Ltdanmagicleg: I don't just want you in my...     73508    130115          1      0
22       GamerGrowthHQ  RT @MissAliCatt: I'm so tired of people's dram...     73508    130115          1      0
23       GamerGrowthHQ  RT @FrostedCaribou: �NEW VIDEO�\n\nPulling MOA...     73508    130115          1      0
24       GamerGrowthHQ  RT @adron_foe: People get so up in arms about ...     73508    130115          1      0
25       GamerGrowthHQ  RT @guccipoptart346: Jumping on the #ModernWar...     73508    130115          1      0
26       GamerGrowthHQ  RT @adron_foe: If my dick and my hand are frie...     73508    130115          1      0
27       GamerGrowthHQ  RT @lebazmada: Time for my #livestream on #twi...     73508    130115          1      0
28       GamerGrowthHQ  RT @GamerGrowthHQ: What is your favorite game ...     73508    130115          1      0
29       GamerGrowthHQ  What is your favorite game to play when ur on ...     73508    130115          0      0

What I don't understand is how to add in the check to not count a username if the username has less than 20 rows in the dataframe.

artemis
  • 6,857
  • 11
  • 46
  • 99

1 Answers1

1

We can do transform

n=20
s=df.groupby('username').username.transform('count')
yourdf=df[s>=n].groupby('username').head(n).reset_index(drop=True)
BENY
  • 317,841
  • 20
  • 164
  • 234
  • This was simple enough. I converted this to a function, but the logic is correct nonetheless. Thank you. – artemis Nov 18 '19 at 23:53