0

I'm new to programming, I'm working on a python project using pandas I wanted to change values of each row of a dataset using .loc, but it seems like it won't work, the idea is to make a row take EOL value if the row is equal to 0, the code doesn't bring an error, but my dataset is unchanged after the iterations. Here is the code:

for machines in telemetry_days['machineID']:
EOL = 365
i = 0

for row in telemetry_days['failure_comp1'].loc[(telemetry_days['machineID'] == machines)]:
    
    if (row != 0):
        EOL = row
      
    elif (row == 0):
        telemetry_days['failure_comp1'].loc[(telemetry_days['machineID'] == machines)].iloc[i] = EOL
    i = i + 1

I think it's because i'm using .iloc so it won't change the value of 'failure_comp1' in the dataset. But I can't figure out how to get a specific row from .loc without using .iloc., if anyone as any suggestions I'd be very grateful, thanks Here is the structure of the whole dataset (don't mind the NaNs): enter image description here Here is what i have for example (for one 'machine'):

index failure_comp1
67    0
254   150
568   0
850   0
998   345

I want it to become this:

index failure_comp1
67    365
254   150
568   150
850   150
998 345

It's a time series dataset and i want to label each component of machines with it's End Of Life time (number of days), i've already got it labeled at the date where it fails, but I want to have it labeled for each row of that specific component.

Ashitaka
  • 3
  • 2
  • Are you just wanting to replace any non 0 with 365 in the `'failure_comp1'` column? – chitown88 Jun 15 '21 at 14:36
  • provide a sample data set (just a few rows) and your desired output. I don't quite understand what you are trying to accomplish – chitown88 Jun 15 '21 at 14:40
  • Do you __need__ to use .loc and/or .iloc? – chitown88 Jun 15 '21 at 14:42
  • I edited my post, and no I don't need to use loc or iloc, i tried using .at() but I couldn't figure out a way either, so any way to do it is fine. – Ashitaka Jun 15 '21 at 14:59
  • Read [this](https://stackoverflow.com/questions/48173980/pandas-knowing-when-an-operation-affects-the-original-dataframe) and you'll know – qmeeus Jun 15 '21 at 15:09
  • so the first 0 needs to be 365. Then all sequential 0s what ever the previous non-zero? – chitown88 Jun 16 '21 at 08:01

1 Answers1

0

So I wouldn't iterate through the rows (although you could if you want, I'll show that solution too). But what I would do is use a .groupby('macineID'). 1) Then convert all the 0s to nan. 2) forward fill the nans. 3) this will leave the first 0 as a nan, so finally fillna with 365.

Given as a sample data set:

import pandas as pd

telemetry_days = pd.DataFrame({
    'machineID':['11','22','33','44','11','22','33','44','11','22','33','44','11','22','33','44','11','22','33','44'],
    'failure_comp1':[0,2,45,0, 
                     150,150,232,0, 
                     0, 0, 0, 0, 
                     0, 12, 0, 0,
                     345, 12, 0, 0]})

Code:

import pandas as pd
import numpy as np


telemetry_days['failure_comp1'] = telemetry_days['failure_comp1'].replace(0, np.nan)
telemetry_days['failure_comp1'] = telemetry_days.groupby('machineID', as_index=False)['failure_comp1'].ffill().fillna(365)

If you want to use the .loc or .iloc:

Here's how I would do it. I would loop through each unique machineID, filter the dataframe to get just those machines, then iterrate through that sub-group. I also would not hard code the i (index). .iteritems() and or iterrows() will returns the index value for you, so just use that.

for machines in telemetry_days['machineID'].unique():
    EOL = 365
   
    for i, row in telemetry_days[telemetry_days['machineID'] == machines]['failure_comp1'].iteritems():
        
        if (row != 0):
            EOL = row
          
        elif (row == 0):
            telemetry_days['failure_comp1'].iloc[i] = EOL
chitown88
  • 27,527
  • 4
  • 30
  • 59