2

I have the following df,

data = [['Male', 'Agree'], ['Male', 'Agree'], ['Male', 'Disagree'], ['Female','Neutral']]
 
df = pd.DataFrame(data, columns = ['Sex', 'Opinion'])
df

& would like to get the total number of Male who either Agree or Disagree. I expect the answer to be 3 but instead get 9.

sum([True for x in df['Opinion'] for y in df['Sex'] if x in ['Agree','Disagree'] if y=='Male' ] 

I have done this through other methods and I'm trying to understand list comprehension better.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Clay Campbell
  • 168
  • 13
  • 4
    Does this answer your question? [How to iterate through two lists in parallel?](/q/1663807/4518341) I know it says "lists", but it works the same for Series. – wjandrea Jan 23 '22 at 17:25

1 Answers1

3

Let's unpack this a bit. The original statement

total = sum([True for x in df['Opinion'] for y in df['Sex'] if x in ['Agree','Disagree'] if y=='Male' ]

is equivalent to

total = 0
for x in df['Opinion']:
    for y in df['Sex']:
        if x in ['Agree', 'Disagree']:
            if y=='Male':
                total += 1

I think it should be clear in this case why you get 9.

What you actually want is to only consider corresponding pairs of two equal sized iterables. There's the handy zip built-in in python which does just this,

total = 0
for x,y in zip(df['Opinion'], df['Sex']):
    if x in ['Agree', 'Disagree'] and y=='Male':
        total += 1

or as a comprehension

total = sum(1 for x,y in zip(df['Opinion'], df['Sex']) if x in ['Agree', 'Disagree'] and y=='Male')
jodag
  • 19,885
  • 5
  • 47
  • 66
  • And also, you don't use list comprehension at the end, right :) I think OP noticed that since they accepted so I leave now. –  Jan 23 '22 at 18:06
  • 1
    @Neither hehe, yeah I switched it for a generator since I can't bring myself to recommend constructing the full list only for it to be immediately reduced by a sum. Thought it might make things confusing to add that discussion in the answer. – jodag Jan 23 '22 at 18:12