-1

I have a data of 500k rows and the formatting of the whole data is kinda inconsistent I'm using Spyder, pandas to do the cleaning of data

I will have a column that consists of numbers or string. I would like to delete the entire row if that particular cell is in string

As shown below is my code with some adjustment due to confidential info

import pandas as pd
import csv
mydataset = pd.read_csv('test.txt', error_bad_lines=False,
                    engine='python',
                    index_col=False,header = None,quoting=csv.QUOTE_NONE,  
                    sep="[\s|,|/]",names=["1","2","3","4","a","b","c",
                    "h","i","j","k","l","m","n","o","p","f","g",
                    "q","r","s","t","u","v","w","x","y","z",
                    "5","6","7","8","9","10","11","12","13","14"])

print (mydataset.shape)

columns =['3','4','h','a','b','c','i','j','k','l','m','n','f','g']
mydataset.drop(columns,inplace=True,axis=1)
print (mydataset.shape)

mydataset = mydataset[(mydataset.q.notnull())&(mydataset.r.notnull())& 
(mydataset.s.notnull())&(mydataset.2.notnull())&(mydataset.2 != "@")]

Pardon the naming convention of the header.

example of data:
1    2    3    4   <--header
abc  123  123  bcd <--Data
123  123  123  bcd <--Data

would like to detect the "abc" and remove the whole row

Please advice!

1 Answers1

-1

using dataframe.map, it may like below(i am not quiet sure that all grammer is right):

def remove(row):
     if 'abc' in row:
          row = []
mydataset.map(remove)
Xiaoyu Xu
  • 858
  • 6
  • 5