0

I have calculated statistical values and written them to a csv file. The nan values are replaced with zeros. There are rows with only zeros and there are rows with both 0 and 0.0 values only. How can I delete these rows? According to the attached image rows number 5 , 6 (only 0.0s), 9 and 11 (both 0s and 0.0s) needs to get deleted. enter image description here

import pandas as pd
all_df = pd.read_csv('source.csv')
all_df.dropna(subset=df_all.columns.tolist()[1:], how='all', inplace=True)
all_df.fillna(0, inplace=True)
all_df.to_csv('outfile.csv', index=False)
seralouk
  • 30,938
  • 9
  • 118
  • 133
Hasa
  • 145
  • 2
  • 10
  • 1
    Please do not post images instead of data. It would be a good idea to read and conform to: [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – sophros Jul 15 '19 at 10:23
  • 1
    Possible duplicate of [delete rows based on a condition in pandas](https://stackoverflow.com/questions/41833624/delete-rows-based-on-a-condition-in-pandas) – sophros Jul 15 '19 at 10:24
  • @sophros I couln't find the answer in that post. – Hasa Jul 15 '19 at 10:55

1 Answers1

2

Use all_df[(all_df.T != 0).any()] or all_df[(all_df != 0).any(axis=1)]:

all_df = pd.DataFrame({'a':[0,0,0,1], 'b':[0,0,0,1]})
print all_df
   a  b
0  0  0
1  0  0
2  0  0
3  1  1
all_df = all_df[(all_df.T != 0).any()]
all_df
   a  b
3  1  1

EDIT 1: After looking at your data, a solution is to convert all numerical columns to float and then do the operations. This problem arises from the way the initial data were saved into the .csv file.

all_df = pd.read_csv('/Users/me/Downloads/Test11.csv')

# do not select 'activity' column
df = all_df.loc[:, all_df.columns != 'activity']

# convert to float
df = df.astype(float)

# remove columns with all 0s
mask = (df != 0).any(axis=1)
df = df[mask]

#mask activity column
recover_lines_of_activity_column = all_df['activity'][mask]

# Final result
final_df = pd.concat([recover_lines_of_activity_column, df], axis = 1)    

Output:

enter image description here

seralouk
  • 30,938
  • 9
  • 118
  • 133
  • Thanks. But where should I add that line. To the code I have given do I just need to add all_df[(all_df.T != 0).any()]? – Hasa Jul 15 '19 at 10:36
  • please note that there are rows with 0.0 and or 0 (both even in one row) eg:row number 9. I need to delete those rows – Hasa Jul 15 '19 at 11:41
  • `(all_df != 0).any(axis=1)` would be better (transposing can be a costly operation) – IanS Jul 15 '19 at 11:43
  • @IanS doesn't work. There are rows with all 0.0 and rows with both 0.0 and 0. I need to delete all those rows. – Hasa Jul 15 '19 at 14:52
  • Try `numpy.isclose` – IanS Jul 15 '19 at 15:10
  • @Hasa can you post a part of your file so that I can check this ? – seralouk Jul 15 '19 at 15:21
  • @serafeim https://fil.email/HXZjjKLv I have post a part of the file. Thanks – Hasa Jul 16 '19 at 09:01
  • @serafeim when I open the csv file in Linux there are both 0.0 and 0 values. when I open the same file in windows there are only 0 values. Thats why I want to remove rows that has only 0 values as well as rows having both 0 and 0.0 values – Hasa Jul 16 '19 at 10:08
  • you need to upload some data because this does not happen in my computer. upload a part of the csv file please – seralouk Jul 16 '19 at 10:42
  • https://www.filemail.com/d/lsrjertaxrbnijn I have attached a copy of the csv file here. Its called Test11.csv – Hasa Jul 16 '19 at 10:47
  • see my updated answer. it works fine now. please consider accepting it – seralouk Jul 16 '19 at 16:48
  • @serafeim thanks. now all the rows with 0.0 values gets deleted. But in the output csv the 'activity' column is missing, which I need . How can I get the 'activity' column displayed in my output csv file. thanks – Hasa Jul 17 '19 at 10:14
  • @serafeim I have accepted your answer. Thanks a lot. – Hasa Jul 17 '19 at 11:34