1

I have data like this format:

Date,Open,High,Low,Close,Adj Close,Volume
2019-07-31,0.44,0.4401,0.44,0.44,0.44,32900
2019-08-01,0.45,0.45,0.45,0.45,0.45,200
2019-08-02,0.44,0.44,0.43,0.44,0.44,13800
2019-08-08,0.45,0.4501,0.45,0.4501,0.4501,400
2019-08-15,0.43,0.43,0.43,0.43,0.43,300
2019-08-15,0.0,0.0,0.0,0.43,0.43,0

Notice the last row has empty data.
How can i filter this row or delete this row?

df = None
for ticker in tickers:
    try:
        df = pd.read_csv('stock_data/daily/{}.csv'.format(ticker), parse_dates=True, index_col=0).dropna()
    except FileNotFoundError as e:
        continue    # continue with next ticker
    df_closes = df['Close']
    if len(df_closes) < 4:
        continue    # continue with next ticker
    df_closes = df_closes[pd.notnull(df['Close'])]   # delete rows with empty data
    df_closes = df_closes.reindex(index=df_closes.index[::-1]) # reversing

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
Daniel
  • 75
  • 1
  • 7
  • What do you mean it has *empty data* ? For the last row, the close is not null. What is the condition, empty "Open", "High" and "Low" ? – Paolo Aug 17 '19 at 15:02
  • The data in Open, High and Low is 0.0 Also there are two rows with 2019-08-15. The last row has to be filtered if just one of all the columns is 0.0 and if there are two of the same dates. – Daniel Aug 17 '19 at 15:04
  • Possible duplicate of [Delete rows from a pandas DataFrame based on a conditional expression involving len(string) giving KeyError](https://stackoverflow.com/questions/13851535/delete-rows-from-a-pandas-dataframe-based-on-a-conditional-expression-involving) – Paolo Aug 17 '19 at 15:07
  • https://stackoverflow.com/questions/41833624/delete-rows-based-on-a-condition-in-pandas – Paolo Aug 17 '19 at 15:08

2 Answers2

0

I assume that you consider a record to be empty given the Volume column is zero, we can filter these out with:

df = df[df['Volume'] > 0]

Or we can check if at least one of Open, High, Low is different from zero with, and filter out rows that have only zeros in these columns:

df = df[df[['Open', 'High', 'Low']].any(axis=1)]

You can remove duplicate indices with:

df = df[~df.index.duplicated()]

We can add a parameter to duplicated(..) which specifies which one to retain. Possible values are 'first', 'last', and False (this means all items of a duplicated index are removed). The default is 'first'.

You can make a combination of first removing records with no values, and then for example remove items with duplicated indices. I would not perform this in the opposite order, since then you risk removing the record with data, and retain the one without data.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
0

Delete the rows with the codes as below:

#Deleting the whole row if a specific column(multiple rows can be added into this list) has value zero
df[df['High'] != 0]
df[df['High'].ne(0)]

or

#If values in any of the column in a row has zero
df[(df != 0).all(1)]
df[~(df == 0).any(axis=1)]
moys
  • 7,747
  • 2
  • 11
  • 42