0

I have strings like I want to buy 2 kg of apples! From these sentences, I want to remove certain punctuation and until now, this was enough:

text = strings.ReplaceAll(text, ".", "")
text = strings.ReplaceAll(text, ",", "")
text = strings.ReplaceAll(text, "?", "")
text = strings.ReplaceAll(text, "!", "")

But when the sentence contains 2.5 kg apples, this will change to 25 kg. Thus, how can I remove punctiation but keep punctuation used in numbers? My idea was to manually iterate over all characters, but I feel there must be a more efficient solution.

user1406177
  • 1,328
  • 2
  • 22
  • 36
  • Does this answer your question? [Removing punctuation from an extremely long string](https://stackoverflow.com/questions/48734599/removing-punctuation-from-an-extremely-long-string) – segFault Jun 12 '20 at 11:42

1 Answers1

0

You could use regexp to first find all punctuation with surrounding characters, then determine if the matching parts are floats (e.g. 2.5) or punctuation characters. Perform a replace for the punctuation and leave the floats alone.

Example:

package main

import (
  "fmt"
  "regexp"
  "strings"
)

func main() {
  text := "I want. to, buy 2.5 kg of apples!"

  // Regexp that finds all puncuation characters grouping the characters that wrap it
  re := regexp.MustCompile(`(.{0,1})([^\w\s])(.{0,1})`)

  // Regexp that determines if a given string is wrapped by digit characters
  isFloat := regexp.MustCompile(`\d([^\w\s])\d`)

  // Get the parts using the punctuation regexp... e.g. "t. "
  parts := re.FindAllString(text, -1);


  // Iterate through the parts
  for _, part := range parts {
    // Determine if the part is a float...
    isAFloat := isFloat.MatchString(part)
    // If it is not a float, make a single replacement to remove the puncuation
    if !isAFloat {
      newPart := re.ReplaceAllString(part, "$1$3")
      text = strings.Replace(text, part, newPart, 1)
    }
  }
  // prints: "I want to buy 2.5 kg of apples"
  fmt.Println(text)
}

Go Playground

Depending on the strings you expect, you may have to run this as a function on the manipulated string a few times until no changes occur, e.g. if the string were "I am not going to replace fully...".

segFault
  • 3,887
  • 1
  • 19
  • 31
  • I made an edit to my answer to actually do the proper replacement. Not sure on the performance though. – segFault Jun 12 '20 at 12:35