0

I have a data frame containing the profession and the questions from a questionnaire used in a survey. I'm trying to summarise which questionnaire we used for which profession, but we have similar questions for different professions, even though we do not have the same questionnaires. So I'm trying to figure out similar questions to similar professions. Basically, I have this:

profession    question
AAAA          question_a
AAAA          question_b 
BBBB          question_a
BBBB          question_d 
CCCC          question_a
CCCC          question_c 

And I want to get something like this:

question      profession
question_a    AAAA
              BBBB 
              CCCC 
question_d    BBBB  
question_c    CCCC 

or perhaps I could get some sort of list or dict in order to use later.

I've tried the command below:

df.groupby(['question','profession']).count()

And gotten the output:

question    profession    other_column_1   other_column_2
question_a  AAAA
            BBBB
.
.
.

The problem with it is that I can't actually work with it. I don't know how to access the question and profession field, I don't know how to list the combinations, etc.

Dumb ML
  • 357
  • 2
  • 12

3 Answers3

1

You are nearly there. All you need to do is create a new df with the results of your code:

df2 = df.groupby(['question','profession']).count()

Now you can access df2 and do what you need.

Is this what you were looking for?

gtomer
  • 5,643
  • 1
  • 10
  • 21
1

I think you need this:

In [676]: grp = df.groupby(['question'])
In [678]: for k,v in grp: 
     ...:     print(v) 
     ...:                                                                                                                                                                                                   
  profession    question
0       AAAA  question_a
2       BBBB  question_a
4       CCCC  question_a
  profession    question
1       AAAA  question_b
  profession    question
5       CCCC  question_c
  profession    question
3       BBBB  question_d
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
1

You can use this:

df.groupby('profession')['question'].apply(','.join).reset_index()

Which gives the output

profession  question
0   AAAA    question_a,question_b
1   BBBB    question_a,question_d
2   CCCC    question_a,question_c

You can replace ',' with '\n' if you want each string to appear on a newline.

Palash Goel
  • 624
  • 6
  • 17