0

I'm cleaning a dataset for a project. There is a column that is called "Review Text," and I need to clean the data in rows that contain strange characters like "ΓÇÃ."

Using excel, I was able to use the "Substitute" function to create a formula to clean one row. However, I wanted to know how to create a nested "substitute" formula to clean all the rows in my dataset, if even possible. Please feel free to share any other possible solutions to my problem.

Excel Function
=SUBSTITUTE(J7,"IΓÇÖve","I've")
DanB
  • 2,022
  • 1
  • 12
  • 24
Chewwe
  • 1
  • XY Problem just maybe? This looks like you imported the data wrong. Or am I mistaken? If not, then many of these substrings have you got to substitute? – JvdV Oct 11 '19 at 18:04
  • are those "strange" characters consistent or random? do you have a list of the correct character against each strange string of characters? It is possible to remove these strange strings from the original text but without a look up list it is impossible to replace them with the desired character. – Terry W Oct 12 '19 at 03:28

1 Answers1

0

In this case, I wouldn't recommend a formulaic solution since it would involve you manually determining what type of replacement is needed for a given row. If there is a great number of types of errors, then you would need to create a unique formula for each unique type of error. In other words, you'd need to manually determine the type of error and the resulting SUBSTITUTE to fix the error.

So, it is much easier to simply Find and Replace each issue to avoid creating a complex formula that does the same task of cleaning. If you select your entire "Review Text" column, each error you fix will apply to the entire column automatically.

To see whether a cell has non-Alphanumeric characters in it, you can use the formula in this related StackOverflow post. You can apply this formula as a column next to the "Review Text" column. Then, you can continually sort for cells that have non-Alpha characters until you Find and Replace every type of error.

NOTE: If you wish to keep the old "Review Text" column, just copy this column into a separate column (e.g., "Review Text_old")