I have a data set like this:
query hit score1 score2
group1 hit1 0.9 0.8
group1 hit2 0.2 0.9
group1 hit3 0.2 0.9
group2 hit1 0.1 0.3
group2 hit2 0.6 0.8
group2 hit3 0.3 0.3
group2 hit4 0.7 0.5
I want the output to return the lowest n (e.g. 2) rows, per group, based on a columnn (e.g. score 1), so this would turn into:
query hit score1 score2
group1 hit2 0.2 0.9
group1 hit3 0.2 0.9
group2 hit1 0.1 0.3
group2 hit3 0.3 0.3
I wrote:
df = pd.read_csv('file_name.txt',sep='\t',header=0)
df = df.groupby('query')['score1'].nsmallest(2).join(df,['score1'])
print(df)
based on seeing this answer.
But I get the error series has no attribute join
.
I did this because when I run:
df = pd.read_csv('file_name.txt',sep='\t',header=0)
df = df.groupby('query')['score1'].nsmallest(2)
print(df)
but it only returns the column that the grouping is done by, I wasn't sure how to extend to return the whole row, hence why I tried based on a SO answer that gave me the error.