1

I have a pandas dataframe with three columns:

Name        Name2           DateTime
                            2016-06-10 05:22
                            2016-06-10 05:23
                            2016-06-10 14:25
Guest       Guest1          2016-06-10 15:32

I have to replace empty spaces with NAN. So AccountName and AccountName2 of rows 1,2,3 and 4 should be NAN. I tried the below statement:

 df3['Name'] = df3['Name'].replace(r'[^\s+]',np.nan, regex=True)

But since I have white spaces after "Guest " in Name, all 5 rows get replaced with NAN.

Edit:

This is our actual data.

Name              Name2                  DateTime
\t\t-\r\n\t      \t\t-\r\n\t            2016-06-10 05:22
\t\t-\r\n\t      \t\t-\r\n\t            2016-06-10 05:23
\t\t-\r\n\t      \t\t-\r\n\t            2016-06-10 14:25
\t\tGuest\r\n\t  \t\tGuest1\r\n\t       2016-06-10 15:32

I used this to remove those escape characters.

df['Name'] = df['Name'].str.replace('\r','').str.replace('\t','').str.replace('\n','').str.replace('-','')

After removing those characters, I am not sure what gets inserted in that place now.

user3447653
  • 3,968
  • 12
  • 58
  • 100

3 Answers3

1

Since you dont need the spaces, this should work.

df3['Name'] = df3['Name'].replace('[\s]+',"", regex=True) #Gets rid of all blank spaces
df3['Name'] = df3['Name'].replace('',np.nan, regex=True) #Replace completely empty cells with nan
Yarnspinner
  • 852
  • 5
  • 7
1

Another solution which found length of data and then by boolean indexing replacing all data with length 0 or 1:

print (df.applymap(len))
   Name  Name2  DateTime
0     0      0        16
1     0      0        16
2     0      0        16
3     5      6        16

df[df.applymap(len) < 2] = np.nan
print (df)
    Name   Name2          DateTime
0    NaN     NaN  2016-06-10 05:22
1    NaN     NaN  2016-06-10 05:23
2    NaN     NaN  2016-06-10 14:25
3  Guest  Guest1  2016-06-10 15:32
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

If you're sure there won't be spaces between the names, you could use .strip() to remove any whitespace (or anything, really) off the ends of the string.

 df3['Name'] = df3['Name'].strip().replace(r'[^\s+]',np.nan, regex=True)
Michael Zhang
  • 1,445
  • 12
  • 14
  • `strip` is not a method for a `Series` `str.strip` is and it requires the dtype to be `str` which the OP has already indicated it is not – EdChum Jun 15 '16 at 13:56
  • Ah I see what the problem is. Maybe you could try using [`DataFrame.applymap()`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.applymap.html#pandas.DataFrame.applymap) and map each element using a function that strips the string, then replaces with nan. – Michael Zhang Jun 15 '16 at 14:01