Python - How to remove more than 1 whitespace

Question

As seen below, there are ton of whitespaces, starting, ending, middle of the lines. I am trying to remove these extra whitespaces from the middle. Here is what I tried, but I keep getting error like:

testdata = [{'col1': ' Sea Ice   Prediction     Network .    '},
     {'col1': ' Movies, Ratings, ....        etc.'},
     {'col1': 'Iceland, Greenland, Mountains  '},
     {'col1': ' My test file'}]
df = pd.DataFrame(testdata)

' '.join(testdata['col1'].split()) #Error: list indices must be integers or slices, not str

df['col1'].str.lstrip() #list indices must be integers or slices, not str
df['col1'].str.rstrip() #list indices must be integers or slices, not str

 #removes start and end, but not ideal to remove one line at a time. 
' Sea Ice     Prediction Network .    '.lstrip()
' Sea Ice     Prediction Network .    '.rstrip()

How do I remove this? Thanks!

Clean Output: 

'Sea Ice Prediction Network .'
'Movies, Ratings, .... etc.'
'Iceland, Greenland, Mountains '
'My test file'

Why are you indexing into `testdata` when you have a DataFrame? — ayhan, Apr 23 '18 at 16:16

score 6 · Accepted Answer · answered Apr 23 '18 at 16:16

6

Using replace

df.replace({' +':' '},regex=True)
Out[348]: 
                             col1
0   Sea Ice Prediction Network . 
1      Movies, Ratings, .... etc.
2  Iceland, Greenland, Mountains 
3                    My test file

answered Apr 23 '18 at 16:16

BENY

317,841
20
164
234

score 1 · Answer 2 · answered Apr 23 '18 at 16:17

You can use the re module to replace any whitespace in a string with a single space, then strip anything from the start and end:

re.sub('\s+', ' ', ' Sea Ice   Prediction     Network .    ').strip()
'Sea Ice Prediction Network .'

Does that space before the . matter?

Python - How to remove more than 1 whitespace

2 Answers2