0

I have a large CSV file that has a list like so:

data1; data2; data3; data4

in data4 the content looks like so: Bad String

The content in data4 should look like: Correct String

What is the best method to iterate through every row in data4 to remove these extra spaces? I know I have to somehow access the position of data4, which would be [3], but I just dont know how to do it for every row while implementing the extra space removal.

Thanks in advance!

pault
  • 41,343
  • 15
  • 107
  • 149
  • Possible duplicate of [Simple way to remove multiple spaces in a string?](https://stackoverflow.com/questions/1546226/simple-way-to-remove-multiple-spaces-in-a-string) – pault Apr 18 '18 at 21:29

3 Answers3

0

Have you tried regex?

import re

bad_string = "This   is     a bad    string"
good_string = re.sub('\s+', ' ', bad_string)  # replace multiple spaces with single

Or if you don't want to use external libraries, you can split on whitespace and use str.join()

bad_string = "This   is     a bad    string"
good_string = " ".join(bad_string.split())
pault
  • 41,343
  • 15
  • 107
  • 149
  • Thank you @pault for your response. I tried both methods and neither made any changes. Could it possibly because I'm implementing the code to a file? like so: good_string = re.sub('\s+', ' ', output_file_location) where "output_file_location" looks like: "2; PP; 16th and Congress; -97.97 30.27, -97.73 30.27, ...,-97.73 30.27" – Michael McKeever Oct 25 '16 at 19:30
  • @MichaelMcKeever- I'm not sure what you mean by implementing the code to a file. The code that I provided takes a string with any number of spaces and replaces multiple spaces with a single space. The first method uses the `re.sub()` which substitutes the search pattern `\s+` (s means whitespace, + means treat a sequence of whitespace as a single occurrence) with a single space. The second method splits a string on whitespace, and then joins the resultant list as a string using a single space as the separator. – pault Oct 26 '16 at 20:24
0

I suppose you mean there is an extra tab following some of the comma's:

str1 = "; -77.1565506 38.8912708,\t -77.1552148 38.8913919,\t -77.1549278 38.8921727, -77.1557808 38.8916717, -77.1565506 38.8912708"
print(str1)

to strip out the tab's, use str.replace(old, new[, max]):

str1_notab = str1.replace('\t','')
print(str1_notab)
Chen
  • 188
  • 2
  • 11
0

You can just use the following to remove extra spaces, tabs, new lines:

original_string = 'This   \t\n contains \n \t   extra  spaces.' 
clean_string = ' '.join(original_string.split())
print(clean_string)

# Output: 'This contains extra spaces.'
S7pidey
  • 1
  • 2