-1

Given a dataframe, I am trying to print out how many cells of one column with a specific value correspond to the same index of another column having other specific values. In this instance the output should be '2' since the condition is df[z]=4 and df[x]=C and only cells 10 and 11 match this requirement. My code does not output any result but only a warning message: :5: DeprecationWarning: elementwise comparison failed; this will raise an error in the future. if (df[df['z']== 4].index.values) == (df[df['x']== 'C'].index.values): :5: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.

Besides fixing this issue, is there another more 'pythonish' way of doing this without a for loop?

import numpy as np
import pandas as pd
data=[['A', 1,2 ,5, 'blue'],
        ['A', 1,5,6, 'blue'],
        ['A', 4,4,7, 'blue']
        ,['B', 6,5,4,'yellow'],
        ['B',9,9,3, 'blue'],
        ['B', 7,9,1,'yellow']
        ,['B', 2,3,1,'yellow'],
        ['B', 5,1,2,'yellow'],
        ['C',2,10,9,'green']
        ,['C', 8,2,8,'green'],
        ['C', 5,4,3,'green'],
        ['C', 8,4 ,3,'green']]
df = pd.DataFrame(data, columns=['x','y','z','xy', 'color'])
k=0
print((df[df['z']==4].index.values))
print(df[df['x']== 'C'].index.values)
for i in (df['z']):
    if (df[df['z']== 4].index.values) == (df[df['x']== 'C'].index.values):
        k+=1
        print(k)
d8a988
  • 71
  • 1
  • 2
  • 10

4 Answers4

2

try:

c=df['z'].eq(4) & df['x'].eq('C')
#your condition

Finally:

count=df[c].index.size
#OR
count=len(df[c].index)

output:

print(count)
>>>2
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
2

You can do the following:

df[(df['z']==4) & (df['x']=='C')].shape[0]

#2
IoaTzimas
  • 10,538
  • 2
  • 13
  • 30
1

Assuming just the number is necessary and not the filtered frame, calculating the number of True values in the Boolean Series is faster:

Calculate the conditions as Boolean Series:

m = df['z'].eq(4) & df['x'].eq('C')

Count True values via Series.sum:

k = m.sum()

or via np.count_nonzero:

k = np.count_nonzero(m)

k:

2

Timing Information via %timeit:

All timing excludes creation of the index as they all use the same index so the timing is similar in all cases:

m = df['z'].eq(4) & df['x'].eq('C')

Henry Ecker (This Answer)

%timeit m.sum()
25.6 µs ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.count_nonzero(m)
7 µs ± 267 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

IoaTzimas

%timeit df[m].shape[0]
151 µs ± 2.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Anurag Dabas

%timeit df[m].index.size
163 µs ± 3.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit len(df[m].index)
165 µs ± 5.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

SeaBean

%timeit df.loc[m].shape[0]
151 µs ± 5.12 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

(Without loc is the same as IoaTzimas)

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
0

You can use .loc with the boolean condition mask of selecting the rows to locate the rows. Then, use shape[0] to get the row count:

df.loc[(df['z']== 4) & (df['x']== 'C')].shape[0]

You can use with or without .loc for the row selection. So, it's the same as:

df[(df['z']== 4) & (df['x']== 'C')].shape[0]

However, it is a good practice to use .loc rather than without it. You can refer to this post for further information.

Result:

2
SeaBean
  • 22,547
  • 3
  • 13
  • 25
  • how would it be possible to output how many times 4 shows in categories A and B without repeating this line: df.loc[(df['z']== 4) & (df['x']== 'C')].shape[0] for every single category? – d8a988 Jun 26 '21 at 14:56
  • @d8a988 You can use: `df.set_index('x').groupby(level=0).apply(lambda d: d.eq(4).sum())`. Then, you can see under each category, the counts of 4 in all columns (except x which holds the category) – SeaBean Jun 26 '21 at 15:40
  • @d8a988 Or, if you want to show also for the category matching your value, you can also use: `df.eq(4).groupby(df['x']).sum()` Your matching value 4 is inside the `.eq()` in `.eq(4)`. – SeaBean Jun 26 '21 at 15:52