0

I want to apply this function

df.column.str.split(expand = True)

but the problem is there are some "empty cells", and when I mean "empty" it means that it has, for example, 6 white-spaces. Moreover, this is an iteration, so sometimes I have cells with 2 white-spaces.

How can I identify this "empty cells"?

PD:

df[df.column != '(6 spaces inside)']

works only for a particular case when there are 6 spaces.

EDIT 1: the df.column is an object type with people names (one or more than one, even errors)

EDIT 2: The idea is to remove this cell (row) in order to successfully applied the "str.split" function. This is an interation so sometimes I have cells with 6 spaces and other with 2 spaces.

EDIT 3: I can't remove all whitespaces because then I won't be able to apply the string separation (because I have names like "Jean Carlo" that I want to separate)

FINAL SOLUTION: I could solve the problem with the post that was signaled only adding a '+' because I have whitespaces in other cells.

Solution:

df = df.replace(['^\s+$'], np.nan, regex = True)

1 Answers1

0
df['Col1'] = df['Col1'].map(lambda x: x.strip())

This will remove all leading and trailing spaces in df['Col1']

philshem
  • 24,761
  • 8
  • 61
  • 127
It_is_Chris
  • 13,504
  • 2
  • 23
  • 41
  • This can have unintended consequences. The OP only suggests that they want to identify cells that are entirely made up of whitespace, not affect whitespace on other strings – roganjosh Mar 19 '18 at 20:12
  • @roganjosh this is also a good point. Would probably be better off mapping a function that uses regex to normalize any multi-space value by returning an single space (or whatever you want your flag character to be) then subselecting on the flag character – zyd Mar 19 '18 at 20:14
  • @zyd the duplicate I have proposed takes care of this – roganjosh Mar 19 '18 at 20:16
  • @roganjosh - it does seem to, thanks for pointing that out. – zyd Mar 19 '18 at 20:18
  • I tried zyd's solution and I have "'float' object has no attribute 'strip'" – Agus Velazquez Mar 19 '18 at 20:20
  • @roganjosh You are correct. That is my mistake. I suppose an alternative option would be to use replace: `df['Col1'] = df['Col1'].replace(' ','6 spaces')` – It_is_Chris Mar 19 '18 at 20:22