1

so, I've been working with pandas in python and I got extracted data from external system with lots of spaces at the end of each column. I got an idea to use on each Series a str.strip() method with a code:

Data["DESCRIPTION"] =  Data["DESCRIPTION"].str.strip()

It basically did its job but I noticed that when I check properties of data frame using I run into an issue that if in one value there were only spaces without any text then it is empty but it does not convert that scalar as null:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18028 entries, 0 to 18027
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   VIN             18028 non-null  object
 1   DESCRIPTION     18028 non-null  object
 2   DESCRIPTION 2   18028 non-null  object
 3   ENGINE          18023 non-null  object
 4   TRANSMISSION    18028 non-null  object
 5   PAINT           18028 non-null  object
 6   EXT_COLOR_CODE  18028 non-null  object
 7   EXT_COLOR_DESC  18028 non-null  object
 8   INT_COLOR_DESC  18028 non-null  object
 9   COUNTRY         18028 non-null  object
 10  PROD_DATE       18028 non-null  object
dtypes: object(11)
memory usage: 1.5+ MB

However checking a condition if the string is empty:

Data['DESCRIPTION 2'] == ""


    0        True
    1        True
    2        True
    3        True
    4        True
             ... 
    18023    True
    18024    True
    18025    True
    18026    True
    18027    True
    Name: DESCRIPTION 2, Length: 18028, dtype: bool

How could I possibly convert all those as null so I could drop them using dropna() function?

I'd be grateful for any suggestions.

Calle
  • 151
  • 1
  • 7

1 Answers1

1

To remove trailing spaces and replace an empty string or records with only spaces as Nan run the below command.

Data["DESCRIPTION"].str.strip().replace(r'^\s*$', np.nan, regex=True)

Please refer to this page Replacing blank values (white space) with NaN in pandas

Snehal Nair
  • 181
  • 1
  • 6