Sorry if this is a repetitive question; I found no others with similar solutions.
I have a very large pandas dataframe called csv_table
:
print(csv_table.shape)
yields (1155522, 6)
The dataframe looks like this:
username tweet following followers is_retweet is_bot
0 narutouz16 RT @GetMadz: Sound design in this game is 10/1... 59 20 1 0
1 narutouz16 @hbthen3rd I know I don't. 59 20 0 0
2 narutouz16 @TonyKelly95 I'm still not satisfied in the en... 59 20 0 0
What I need to do is to create a smaller Dataframe
with just the first 20 rows for each username, and skip usernames that do not have at least 20 rows.
I have looked at this question which suggests using something like the following:
df.groupby('username').head(20).reset_index(drop=True)
This produces rather good results:
username tweet following followers is_retweet is_bot
0 narutouz16 RT @GetMadz: Sound design in this game is 10/1... 59 20 1 0
1 narutouz16 @hbthen3rd I know I don't. 59 20 0 0
2 narutouz16 @TonyKelly95 I'm still not satisfied in the en... 59 20 0 0
3 narutouz16 I'm currently in second place in my leaderboar... 59 20 0 0
4 narutouz16 @TheRealRotimi live footage of us at spin. htt... 59 20 0 0
5 narutouz16 Duolingo has more content than I thought, add ... 59 20 0 0
6 narutouz16 @TonyKelly95 It dont go down 59 20 0 0
7 narutouz16 This is my meme day, where I explore the inter... 59 20 0 0
8 narutouz16 RT @DitzyFlama: ygHeAZSQkA 59 20 1 0
9 narutouz16 When you turn around and someone went from sin... 59 20 0 0
10 narutouz16 How dare you leave me with a cliffhanger in ch... 59 20 0 0
11 narutouz16 I'm entering my popular phase. 59 20 0 0
12 narutouz16 RT @gurugurugravity: #ThankYouGameFreak for th... 59 20 1 0
13 narutouz16 @TonyKelly95 I'm pretty sure that guy was just... 59 20 0 0
14 narutouz16 Yeah, when christmas time comes, I'm about to ... 59 20 0 0
15 narutouz16 I don't like higher education, because the las... 59 20 0 0
16 narutouz16 I found a spotify playlist called childhood Bo... 59 20 0 0
17 narutouz16 Theres two type of people in this world. Peopl... 59 20 0 0
18 narutouz16 I just want to let people know just dance 2020... 59 20 0 0
19 narutouz16 RT @AAAAAGGHHHH: PxD2vdLelo 59 20 1 0
20 GamerGrowthHQ RT @zFakes_: Looking for an editor to make My ... 73508 130115 1 0
21 GamerGrowthHQ RT @Ltdanmagicleg: I don't just want you in my... 73508 130115 1 0
22 GamerGrowthHQ RT @MissAliCatt: I'm so tired of people's dram... 73508 130115 1 0
23 GamerGrowthHQ RT @FrostedCaribou: �NEW VIDEO�\n\nPulling MOA... 73508 130115 1 0
24 GamerGrowthHQ RT @adron_foe: People get so up in arms about ... 73508 130115 1 0
25 GamerGrowthHQ RT @guccipoptart346: Jumping on the #ModernWar... 73508 130115 1 0
26 GamerGrowthHQ RT @adron_foe: If my dick and my hand are frie... 73508 130115 1 0
27 GamerGrowthHQ RT @lebazmada: Time for my #livestream on #twi... 73508 130115 1 0
28 GamerGrowthHQ RT @GamerGrowthHQ: What is your favorite game ... 73508 130115 1 0
29 GamerGrowthHQ What is your favorite game to play when ur on ... 73508 130115 0 0
What I don't understand is how to add in the check to not count a username if the username has less than 20 rows in the dataframe.