1

I have a pandas DataFrame with columns

[Brand, CPL1, CPL4, Part Number, Calendar Year/Month, value, type]

When they come out of StatsModels X13, they occasionaly have very large string representations of integers in values that make no sense in their context, EG:

[float(1.2), float(1.3), str("63478"), float(1.1)]

How can I remove the rows where this has happened? Due to them being string representations of integers, I cannot cast them or any similar method.

Jeremy Barnes
  • 642
  • 1
  • 9
  • 24
  • What's the source of the data? What's the origin of the defective columns (or rows in columns)? Some specific sample data and/or code would help. – John Zwinck Nov 04 '16 at 14:28
  • The origin is an SAP Hana xls file being imported to DataFrame, flattening each part number to a series and coming out of statsmodels x13. The series that come out of x13 contain those irregularities. – Jeremy Barnes Nov 04 '16 at 14:31

1 Answers1

1

You can use boolean indexing with checking if type is string:

DataFrame:

df = pd.DataFrame([[float(1.2), float(1.3), str("63478"), float(1.1)],
                  [float(1.2), float(1.3), float(1.1), str("63478")]]).T

print (df)
      0      1
0    1.2    1.2
1    1.3    1.3
2  63478    1.1
3    1.1  63478

print (df.applymap(lambda x: isinstance(x, str)))
       0      1
0  False  False
1  False  False
2   True  False
3  False   True

print (df.applymap(lambda x: isinstance(x, str)).any(axis=1))
0    False
1    False
2     True
3     True
dtype: bool

print (df[~df.applymap(lambda x: isinstance(x, str)).any(axis=1)])
     0    1
0  1.2  1.2
1  1.3  1.3

Series:

s = pd.Series([float(1.2), float(1.3), str("63478"), float(1.1)])
print (s)
0      1.2
1      1.3
2    63478
3      1.1
dtype: object

print (s.apply(lambda x: isinstance(x, str)))
0    False
1    False
2     True
3    False
dtype: bool

print (s[~s.apply(lambda x: isinstance(x, str))])
0    1.2
1    1.3
3    1.1
dtype: object
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252