-1

basically trying to do some data cleaning in Jupyter Notebook and i am not the best with the syntax so have reached a roadblock. I have a column at the end with mean and if the mean is 0, i would like to delete that row as well as another row which is either above or below that has the same 'Customer' and 'Date' column values.

Example code i am trying to perform this on

So far i am trying something across these lines but with little success where both_index is my variable name for the DataFrame

for i in both_index['peak mean']: #this is part of code below   

    if i == 0:
        for j in both_index['peak mean']:
            if both_index['Customer'][i] == both_index['Customer'][j]: #this is wrong and i dont know the syntax
                both_index.drop(j)
        both_index.drop(i)

Any help would be appreciated thank you!

Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • Is there an error when you run it? If so, please provide the full trace back error. Thanks! :3 – DialFrost Aug 13 '22 at 05:15
  • The error is a KeyError = 0.0 which im not sure what it means. Im assuming its something wrong with the if both_index['Customer'][i] == both_index['Customer'][j]: #this is wrong and i dont know the syntax line – astronaut19 Aug 13 '22 at 05:21
  • 1
    Please [do not upload images of code/data/errors when asking a question](http://meta.stackoverflow.com/q/285551) and provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example), including a small example input data and the corresponding expected result. One reason questions like these are hard to answer immediately is that it is unclear what exactly the columns are, what is the index, etc. Also, testing floats for 0 should usually allow for a small epsilon to account for limited numerical precision. – Pierre D Aug 13 '22 at 11:43

3 Answers3

0
both_index = both_index.set_index(['Customer', 'date'])
df1 = both_index[both_index['peak mean'] == 0]
both_index = both_index.loc[~both_index.index.isin(df1.index)]
  • hey thanks for your reply, i ran that and got an error in the 1st row which says KeyError: "None of ['Customer', 'date'] are in the columns" – astronaut19 Aug 13 '22 at 05:45
  • Use column name that are in you dataframe. Like in above image there is Customer and date. So replace Date with date. – Anonymous89u Aug 13 '22 at 05:51
  • Yeah i changed them to the column names in my dataFrame and it still gives that error which makes no sense to me. I did print(both_index.columns) and it printed those columns but I can't see why it isn't reading them – astronaut19 Aug 13 '22 at 05:57
  • 2
    Can you edit your answer so it explains what was the problem and how this code solves it? – trincot Aug 13 '22 at 13:03
0

I'm not too familiar with pandas or jupyter, but I found another question with an answer that looks promising. It mentioned a way to drop rows with a condition like this:

DF.drop(DF[DF.LABEL CONDITION].index, inplace=True)

In your case, it would look like this:

both_index.drop(both_index[df['peak mean'] == 0].index, inplace=True)

Backup your data if you can before trying this.

EDIT: Just noticed that this answer isn't complete. Won't delete the extra rows with the same 'Customer' and 'Date'

jcabre04
  • 1
  • 2
0

You could try this:

df = pd.DataFrame({'Customer': ['#1','#1','#1','#2','#2','#3','#3'], 'Date': [1,1,2,2,2,3,3], 'peak mean': [1,0,2,2,3,0,4]})'peak mean': [1,0,2,2,3,0,4]})
df = df.set_index(['Customer', 'Date'])    #do not miss this important step!
print(df)

               peak mean
Customer Date           
#1       1             1    #to delete
#1       1             0    #to delete
#1       2             2
#2       2             2
#2       2             3
#3       3             0    #to delete
#3       3             4    #to delete

In this example, we expect to keep only 1 row of customer #1, and also both rows of customer #2

df1 = df.drop(df.loc[df['peak mean']==0].index)
print(df1)

               peak mean
Customer Date           
#1       2             2
#2       2             2
#2       2             3
threadfin
  • 143
  • 4