1

My data has the following structure:

table

(Each question is repeated six times). Here it is in csv format:

Question,Person,Answer
Q_1,p1,3
Q_1,p1,3
Q_1,p1,3
Q_1,p1,3
Q_1,p1,3
Q_1,p1,3
Q_2,p1,1
Q_2,p1,1
Q_2,p1,1
Q_2,p1,1
Q_2,p1,1
Q_2,p1,1
Q_3,p1,1
Q_3,p1,1
Q_3,p1,3
Q_3,p1,1
Q_3,p1,2
Q_3,p1,2

What I'm trying to do is find the most frequent answer for each question and create a new .csv file containing only the most frequent answers. For example, the output should look like this:

output

I tried using df.mode() to find the most frequent response, but this returns a single value for the whole column 'Answer'. I need to somehow run it every 6th row and get a separate value for each question. How can I do this?

Monika
  • 301
  • 3
  • 12
  • Have you tried `df.groupby` and then apply mode? – Michael S. Jul 31 '22 at 13:52
  • 1
    Does this answer your question? [GroupBy pandas DataFrame and select most common value](https://stackoverflow.com/questions/15222754/groupby-pandas-dataframe-and-select-most-common-value) – Ynjxsjmh Jul 31 '22 at 14:05

1 Answers1

1

Use pandas's groupby and then apply agg(pd.Series.mode) like so:

columns = ["Question","Person","Answer"]
data = [["Q_1","p1",3],
["Q_1","p1",3],
["Q_1","p1",3],
["Q_1","p1",3],
["Q_1","p1",3],
["Q_1","p1",3],
["Q_2","p1",1],
["Q_2","p1",1],
["Q_2","p1",1],
["Q_2","p1",1],
["Q_2","p1",1],
["Q_2","p1",1],
["Q_3","p1",1],
["Q_3","p1",1],
["Q_3","p1",3],
["Q_3","p1",1],
["Q_3","p1",2],
["Q_3","p1",2]]

df = pd.DataFrame(data, columns = columns)
df.groupby(["Question"], as_index=False).agg(pd.Series.mode)

Output:

    Question    Person  Answer
0   Q_1          p1       3
1   Q_2          p1       1
2   Q_3          p1       1
Michael S.
  • 3,050
  • 4
  • 19
  • 34