-1

I read an .xlsx file where some cells contain special characters. This has caused me problems when it comes to inserting such data into a database, so I am trying to replace such characters with blanks, as shown below:

using System;
using System.Text.RegularExpressions;

namespace ConsoleApp2
{
    public class Program
    {
        public static void Main(string[] args)
        {
            var name = "MONEDA S.A. ADMINISTRADORA DE FONDOS DE INVERSIモN";
            name = Regex.Replace(name, @"[^A-Za-z0-9 ]", "");
            Console.WriteLine(name);

        }
    }
}

but in that way, I also substitute characters like ., - and , - which is undesirable. So, how can I replace only non-Roman characters?

Costa.Gustavo
  • 849
  • 10
  • 21

1 Answers1

1

Have a look at an ASCII table. From what you said, just strip out anything that isn't standard ASCII:

var name = "MONEDA S.A. ADMINISTRADORA DE FONDOS DE INVERSIモN";
name = new string(name.Where(c => (int)c <= 127).ToArray());
mtreit
  • 789
  • 4
  • 6
  • \x00-\xff is too restrictive, it does not include foreign alphabets. [\w-[a-zA-Z0-9]] would be better. But it depends on what needs to be matched, non-Roman is ambiguous. – TonyR Aug 27 '20 at 08:29