0

I was wondering if there is a kind of "when statement" in Python. For instance, in the code below, at some point all of the observations are true. The way I wrote it, only reads the first statement and ignores the other 2. How can I make it work to print the counts of all the observations under that particular condition?

counts_zero_time = 0
counts_zero_time_non_zero_dist = 0
counts_zero_time_zero_dist = 0

for index,row in df.iterrows():
    
    if row["time_interval"] == 0:
        counts_zero_time += 1
        
    elif row["time_interval"] == 0 and row["distance"] !=0:
        counts_zero_time_non_zero_dist += 1
    
    elif row["time_interval"] == 0 and row["distance"] ==0:
        counts_zero_time_zero_dist += 1
        
print(counts_zero_time, "observations have a time interval of 0")
print(counts_zero_time_non_zero_dist, "observations have a time interval of 0 and a non-zero distance")
print(counts_zero_time_zero_dist, "observations have a time interval and distance of 0")

In the output, I get the correct counts for the first condition but I get zero for the other two which is incorrect.

1421 observations have a time interval of 0
0 observations have a time interval of 0 and a non-zero distance
0 observations have a time interval and distance of 0
Joehat
  • 979
  • 1
  • 9
  • 36
  • 1
    You can replace `elif` with `if`. But a better way would be to use pandas expressions. then you don't need to iterate through the dataframe – jkr Jul 19 '21 at 20:10
  • Do you mean adding an individual line to print the count? Like this zero_interval = ```trip_char.loc[(df["time_interval"] == 0)]``` and then print it's length? – Joehat Jul 19 '21 at 20:12
  • 1
    yes something like that. Please see my answer – jkr Jul 19 '21 at 20:17

4 Answers4

1

Based on your logic, you can either use independent if statement like this:

for index, row in df.iterrows():
    if row["time_interval"] == 0:
        counts_zero_time += 1
        
    if row["time_interval"] == 0 and row["distance"] != 0:
        counts_zero_time_non_zero_dist += 1
    
    if row["time_interval"] == 0 and row["distance"] == 0:
        counts_zero_time_zero_dist += 1

Or, you can nest the second and third elif statements inside the first one, as so:

for index, row in df.iterrows():
    if row["time_interval"] == 0:
        counts_zero_time += 1
        
        if row["distance"] != 0:
            counts_zero_time_non_zero_dist += 1
        else:
            counts_zero_time_zero_dist += 1
DjaouadNM
  • 22,013
  • 4
  • 33
  • 55
1

You can use pandas operations to avoid iterating over the dataframe. Iterating over a dataframe is an anti-pattern in pandas.

You can use boolean masking to find out which samples match your condition. Then you can sum over that dataframe, and that will tell you how many of your samples match.

time_interval_is_zero = df.loc[:, "time_interval"] == 0
distance_is_zero = df.loc[:, "distance"] == 0

counts_zero_time = time_interval_is_zero.sum()
counts_zero_time_zero_dist = (time_interval_is_zero & distance_is_zero).sum()
counts_zero_time_non_zero_dist = (time_interval_is_zero & ~distance_is_zero).sum()

The ~ inverts the boolean values, so True becomes False and vice versa. & computes and for each pair of rows.


Regarding your original implementation, later elif statements will not be reached if a previous elif or if statement is True. To evaluate all conditions, you should make them all if statements.

jkr
  • 17,119
  • 2
  • 42
  • 68
1

Conditional statements using if/elif terminate at the first True statement. Consider this snippet

if True:
    print('a')
elif True and True:
    print('b')
elif True and True:
    print('c')

Only the character 'a' will get printed since the first statement is True. In this next snippet however 'a','b', and 'c' will all be printed because each condition is evaluated independently of the others.

if True:
    print('a')
if True and True:
    print('b')
if True and True:
    print('c')

To fix the bug in your code you have to evaluate each of your conditions independently e.g.

if row["time_interval"] == 0:
    counts_zero_time += 1     
if row["time_interval"] == 0 and row["distance"] !=0:
    counts_zero_time_non_zero_dist += 1
if row["time_interval"] == 0 and row["distance"] ==0:
    counts_zero_time_zero_dist += 1
0x263A
  • 1,807
  • 9
  • 22
1

Change the order of the if/elif statements. Put the row['time_interval'] == 0 on the last place. Something like this:

for index,row in df.iterrows():
    if row["time_interval"] == 0 and row["distance"] !=0:
        counts_zero_time_non_zero_dist += 1
    elif row["time_interval"] == 0 and row["distance"] ==0:
        counts_zero_time_zero_dist += 1
    if row["time_interval"] == 0:
        counts_zero_time += 1
    
Dzemo997
  • 328
  • 2
  • 5