Pandas: Delete rows with different encoding of 0s in python

Question

I have calculated statistical values and written them to a csv file. The nan values are replaced with zeros. There are rows with only zeros and there are rows with both 0 and 0.0 values only. How can I delete these rows? According to the attached image rows number 5 , 6 (only 0.0s), 9 and 11 (both 0s and 0.0s) needs to get deleted.

import pandas as pd
all_df = pd.read_csv('source.csv')
all_df.dropna(subset=df_all.columns.tolist()[1:], how='all', inplace=True)
all_df.fillna(0, inplace=True)
all_df.to_csv('outfile.csv', index=False)

Please do not post images instead of data. It would be a good idea to read and conform to: [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — sophros, Jul 15 '19 at 10:23
Possible duplicate of [delete rows based on a condition in pandas](https://stackoverflow.com/questions/41833624/delete-rows-based-on-a-condition-in-pandas) — sophros, Jul 15 '19 at 10:24

seralouk · Accepted Answer · 2019-07-17T10:33:57.110

2

Use `all_df[(all_df.T != 0).any()]` or `all_df[(all_df != 0).any(axis=1)]`:

all_df = pd.DataFrame({'a':[0,0,0,1], 'b':[0,0,0,1]})
print all_df

all_df = all_df[(all_df.T != 0).any()]
all_df

   a  b
3  1  1

EDIT 1: After looking at your data, a solution is to convert all numerical columns to float and then do the operations. This problem arises from the way the initial data were saved into the .csv file.

all_df = pd.read_csv('/Users/me/Downloads/Test11.csv')

# do not select 'activity' column
df = all_df.loc[:, all_df.columns != 'activity']

# convert to float
df = df.astype(float)

# remove columns with all 0s
mask = (df != 0).any(axis=1)
df = df[mask]

#mask activity column
recover_lines_of_activity_column = all_df['activity'][mask]

# Final result
final_df = pd.concat([recover_lines_of_activity_column, df], axis = 1)

Output:

edited Jul 17 '19 at 10:33

answered Jul 15 '19 at 10:23

seralouk

30,938
9
118
133

Thanks. But where should I add that line. To the code I have given do I just need to add all_df[(all_df.T != 0).any()]? – Hasa Jul 15 '19 at 10:36
please note that there are rows with 0.0 and or 0 (both even in one row) eg:row number 9. I need to delete those rows – Hasa Jul 15 '19 at 11:41
`(all_df != 0).any(axis=1)` would be better (transposing can be a costly operation) – IanS Jul 15 '19 at 11:43
@IanS doesn't work. There are rows with all 0.0 and rows with both 0.0 and 0. I need to delete all those rows. – Hasa Jul 15 '19 at 14:52
Try `numpy.isclose` – IanS Jul 15 '19 at 15:10
@Hasa can you post a part of your file so that I can check this ? – seralouk Jul 15 '19 at 15:21
@serafeim https://fil.email/HXZjjKLv I have post a part of the file. Thanks – Hasa Jul 16 '19 at 09:01
@serafeim when I open the csv file in Linux there are both 0.0 and 0 values. when I open the same file in windows there are only 0 values. Thats why I want to remove rows that has only 0 values as well as rows having both 0 and 0.0 values – Hasa Jul 16 '19 at 10:08
you need to upload some data because this does not happen in my computer. upload a part of the csv file please – seralouk Jul 16 '19 at 10:42
https://www.filemail.com/d/lsrjertaxrbnijn I have attached a copy of the csv file here. Its called Test11.csv – Hasa Jul 16 '19 at 10:47
see my updated answer. it works fine now. please consider accepting it – seralouk Jul 16 '19 at 16:48
@serafeim thanks. now all the rows with 0.0 values gets deleted. But in the output csv the 'activity' column is missing, which I need . How can I get the 'activity' column displayed in my output csv file. thanks – Hasa Jul 17 '19 at 10:14
@serafeim I have accepted your answer. Thanks a lot. – Hasa Jul 17 '19 at 11:34

Pandas: Delete rows with different encoding of 0s in python

1 Answers1

Use `all_df[(all_df.T != 0).any()]` or `all_df[(all_df != 0).any(axis=1)]`:

EDIT 1: After looking at your data, a solution is to convert all numerical columns to float and then do the operations. This problem arises from the way the initial data were saved into the .csv file.

Linked

Pandas: Delete rows with different encoding of 0s in python

1 Answers1

Use all_df[(all_df.T != 0).any()] or all_df[(all_df != 0).any(axis=1)]:

EDIT 1: After looking at your data, a solution is to convert all numerical columns to float and then do the operations. This problem arises from the way the initial data were saved into the .csv file.

Linked

Use `all_df[(all_df.T != 0).any()]` or `all_df[(all_df != 0).any(axis=1)]`: