It not working, because pandas in skiprows omit rows by positions:
data = "i,a,b\ngood,1,2\nbad,3,a\nbad,a,b\ngood,1,2\nbad,3,a"
df = pd.read_csv(StringIO(data))
print (df)
i a b
0 good 1 2
1 bad 3 a
2 bad a b
3 good 1 2
4 bad 3 a
df = pd.read_csv(StringIO(data),skiprows=lambda index: 2 == index)
print (df)
i a b
0 good 1 2
1 bad a b
2 good 1 2
3 bad 3 a
df = pd.read_csv(StringIO(data),index_col='i', skiprows=lambda index: 2 == index)
print (df)
a b
i
good 1 2
bad a b
good 1 2
bad 3 a
What is shorter way:
df = pd.read_csv(StringIO(data),skiprows=[2])
print (df)
i a b
0 good 1 2
1 bad a b
2 good 1 2
3 bad 3 a
But if want remove index by name:
df = pd.read_csv(StringIO(data),index_col='i', skiprows=['bad'])
print (df)
TypeError: an integer is required
Not working, no raise error:
df = pd.read_csv(StringIO(data),index_col='i', skiprows=lambda index: 'bad' == index)
print (df)
a b
i
good 1 2
bad 3 a
bad a b
good 1 2
bad 3 a
df = pd.read_csv(StringIO(data), skiprows=lambda index: 'bad' == index)
print (df)
i a b
0 good 1 2
1 bad 3 a
2 bad a b
3 good 1 2
4 bad 3 a
Verifying sample solution from pandas documentation:
df = pd.read_csv(StringIO(data), skiprows=lambda x: x % 2 != 0)
print (df)
i a b
0 bad 3 a
1 good 1 2
df = pd.read_csv(StringIO(data), index_col='i',skiprows=lambda x: x % 2 != 0)
print (df)
a b
i
bad 3 a
good 1 2
EDIT: Possible solution with preprocessing data for positions for skip:
df = pd.read_csv('a.csv')
print (df)
i a b
0 good 1 2
1 bad 3 a
2 bad a b
3 good 1 2
4 bad 3 a
#preprocessing
def get_row(data):
out = []
with open('a.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for i, row in enumerate(reader):
if row[0] == data:
out.append(i)
return out
skip = get_row('bad')
print(skip)
[2, 3, 5]
df = pd.read_csv('a.csv', skiprows=get_row('bad') )
print (df)
i a b
0 good 1 2
1 good 1 2