This is a mock dataframe.
df_test = pd.DataFrame({
'ID': [8972685, 8972685, 8972685, 8972685, 8972685, 8972685, 9834561, 9834561, 9834561, 9834561, 9834561, 9834561],
'POST': ['texteghteh', 'tethrtxt', 'tetrhrtxt', 'terthtrxt', 'teetrwxt', 'twetrhext', 'tethdxt', 'texthdt', 'texdhtrt', 'texdthdt', 'tdghgdhtext', 'tthtdext']
})
Basically the bigger dataframe contains approximately 90000 distinct users and 28000000 rows. Each row contains a post made by some user. What I want is to pick n users from the dataframe along with their posts. Let's say I want to pick the first 500 users and each has 1000 posts. Basically I need to obtain 500000 rows.
I previously asked this and it was instantly marked as duplicate which I think it's not. This is another answer but I did not manage to apply those solutions successfully. I need it the other way round. First n groups regardless of entries.
I tried this:
df_test.groupby('ID')['POST'].head(2)
which yields:
0 texteghteh
1 tethrtxt
6 tethdxt
7 texthdt
Name: POST, dtype: object
This gives me the first two posts from each user. I want to see the 2 users with their posts.