Replaces spaces with NAN in pandas dataframe

Question

I have a pandas dataframe with three columns:

Name        Name2           DateTime
                            2016-06-10 05:22
                            2016-06-10 05:23
                            2016-06-10 14:25
Guest       Guest1          2016-06-10 15:32

I have to replace empty spaces with NAN. So AccountName and AccountName2 of rows 1,2,3 and 4 should be NAN. I tried the below statement:

 df3['Name'] = df3['Name'].replace(r'[^\s+]',np.nan, regex=True)

But since I have white spaces after "Guest " in Name, all 5 rows get replaced with NAN.

Edit:

This is our actual data.

Name              Name2                  DateTime
\t\t-\r\n\t      \t\t-\r\n\t            2016-06-10 05:22
\t\t-\r\n\t      \t\t-\r\n\t            2016-06-10 05:23
\t\t-\r\n\t      \t\t-\r\n\t            2016-06-10 14:25
\t\tGuest\r\n\t  \t\tGuest1\r\n\t       2016-06-10 15:32

I used this to remove those escape characters.

df['Name'] = df['Name'].str.replace('\r','').str.replace('\t','').str.replace('\n','').str.replace('-','')

After removing those characters, I am not sure what gets inserted in that place now.

You could strip the spaces and then replace `df['Name'].str.strip().replace(r'[^\s+]',np.nan, regex=True)` — EdChum, Jun 15 '16 at 13:42
df['Name'] is of type float64. I am trying to convert it to string and then replace it but I am unsuccessful still. — user3447653, Jun 15 '16 at 13:44
@Yarnspinner: No there is no need of spaces. Just the text is enough for me. — user3447653, Jun 15 '16 at 14:10

score 1 · Answer 1 · answered Jun 15 '16 at 14:12

1

Since you dont need the spaces, this should work.

df3['Name'] = df3['Name'].replace('[\s]+',"", regex=True) #Gets rid of all blank spaces
df3['Name'] = df3['Name'].replace('',np.nan, regex=True) #Replace completely empty cells with nan

answered Jun 15 '16 at 14:12

Yarnspinner

852
5
7

score 1 · Accepted Answer · answered Jun 15 '16 at 14:30

1

Another solution which found length of data and then by boolean indexing replacing all data with length 0 or 1:

print (df.applymap(len))
   Name  Name2  DateTime
0     0      0        16
1     0      0        16
2     0      0        16
3     5      6        16

df[df.applymap(len) < 2] = np.nan
print (df)
    Name   Name2          DateTime
0    NaN     NaN  2016-06-10 05:22
1    NaN     NaN  2016-06-10 05:23
2    NaN     NaN  2016-06-10 14:25
3  Guest  Guest1  2016-06-10 15:32

answered Jun 15 '16 at 14:30

jezrael

822,522
95
1,334
1,252

Super. I am very happy, because I once do similar replacing and it was horror. – jezrael Jun 15 '16 at 14:42
Yeah it was pulling my leg since yesterday, thank you very much !! – user3447653 Jun 15 '16 at 14:45

score 0 · Answer 3 · answered Jun 15 '16 at 13:50

0

If you're sure there won't be spaces between the names, you could use .strip() to remove any whitespace (or anything, really) off the ends of the string.

 df3['Name'] = df3['Name'].strip().replace(r'[^\s+]',np.nan, regex=True)

answered Jun 15 '16 at 13:50

Michael Zhang

1,445
12
14

`strip` is not a method for a `Series` `str.strip` is and it requires the dtype to be `str` which the OP has already indicated it is not – EdChum Jun 15 '16 at 13:56
Ah I see what the problem is. Maybe you could try using [`DataFrame.applymap()`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.applymap.html#pandas.DataFrame.applymap) and map each element using a function that strips the string, then replaces with nan. – Michael Zhang Jun 15 '16 at 14:01

Replaces spaces with NAN in pandas dataframe

3 Answers3

Linked