Csv analysis with pandas: Most frequent response of certain rows

Question

My data has the following structure:

(Each question is repeated six times). Here it is in csv format:

Question,Person,Answer
Q_1,p1,3
Q_1,p1,3
Q_1,p1,3
Q_1,p1,3
Q_1,p1,3
Q_1,p1,3
Q_2,p1,1
Q_2,p1,1
Q_2,p1,1
Q_2,p1,1
Q_2,p1,1
Q_2,p1,1
Q_3,p1,1
Q_3,p1,1
Q_3,p1,3
Q_3,p1,1
Q_3,p1,2
Q_3,p1,2

What I'm trying to do is find the most frequent answer for each question and create a new .csv file containing only the most frequent answers. For example, the output should look like this:

I tried using df.mode() to find the most frequent response, but this returns a single value for the whole column 'Answer'. I need to somehow run it every 6th row and get a separate value for each question. How can I do this?

Does this answer your question? [GroupBy pandas DataFrame and select most common value](https://stackoverflow.com/questions/15222754/groupby-pandas-dataframe-and-select-most-common-value) — Ynjxsjmh, Jul 31 '22 at 14:05

score 1 · Accepted Answer · answered Jul 31 '22 at 14:13

Use pandas's groupby and then apply agg(pd.Series.mode) like so:

columns = ["Question","Person","Answer"]
data = [["Q_1","p1",3],
["Q_1","p1",3],
["Q_1","p1",3],
["Q_1","p1",3],
["Q_1","p1",3],
["Q_1","p1",3],
["Q_2","p1",1],
["Q_2","p1",1],
["Q_2","p1",1],
["Q_2","p1",1],
["Q_2","p1",1],
["Q_2","p1",1],
["Q_3","p1",1],
["Q_3","p1",1],
["Q_3","p1",3],
["Q_3","p1",1],
["Q_3","p1",2],
["Q_3","p1",2]]

df = pd.DataFrame(data, columns = columns)
df.groupby(["Question"], as_index=False).agg(pd.Series.mode)

Output:

    Question    Person  Answer
0   Q_1          p1       3
1   Q_2          p1       1
2   Q_3          p1       1

Csv analysis with pandas: Most frequent response of certain rows

1 Answers1