-3

I want to ask how I can delete all characters from string which are not present in alphabet string.

I could use for loop to check every character, but I need something with better performance because application have to process multiple(up to ~200) big files. 1 string = 1 file content

gotqn
  • 42,737
  • 46
  • 157
  • 243
Asker
  • 93
  • 7
  • 1
    Have you looked at regular expressions at all? – rory.ap Jan 22 '15 at 16:04
  • 5
    _"I could use for loop to check every character"_ - one way or another, this must happen anyway. Just do it using a loop. – CodeCaster Jan 22 '15 at 16:05
  • 2
    @roryap: Why would regex be more performant than a `for` loop? – O. R. Mapper Jan 22 '15 at 16:06
  • 1
    Also, `replace` != `delete`. – 500 - Internal Server Error Jan 22 '15 at 16:06
  • 1
    @O.R.Mapper -- [Performant isn't a word](http://english.stackexchange.com/questions/38945/what-is-wrong-with-the-word-performant). – rory.ap Jan 22 '15 at 16:08
  • 1
    How are you supposed to check a string against a white list of characters without looking at every single character? And performance depends on your constraints. If it's processing time, loading all files and parallel processing them may come in handy. If it's memory usage, you may process each file line by line. – Ortiga Jan 22 '15 at 16:09
  • 1
    @roryap Seems to be more of a word more recently: https://books.google.com/ngrams/graph?content=performant&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Cperformant%3B%2Cc0 – dav_i Jan 22 '15 at 16:11
  • @roryap: Good to know. Not a native English speaker, and in my language, it is. I guess "performant" sounded too plausibly English to me ;) – O. R. Mapper Jan 22 '15 at 16:13
  • @O.R.Mapper -- I'm just being pedantic :) I actually like words that software people "make up" or "adapt" from real words. That's how words end up in the dictionary. For the longest time, I couldn't find the definition of "deprecated" that had anything to do with software in the dictionary. – rory.ap Jan 22 '15 at 16:16

2 Answers2

4

One option is to use LINQ:

var s1 = "hello world";
var s2 = "abcdefghijklmno";

var s3 = new String((from c1 in s1.ToCharArray()
                     join c2 in s2.ToCharArray() on c1 equals c2
                     select c1).ToArray());

Console.WriteLine(s3); // helloold

The advantage of using LINQ is that you can stream the data as to not have to load the whole file into memory.

Community
  • 1
  • 1
dav_i
  • 27,509
  • 17
  • 104
  • 136
2

Another way is to use Regex:

var s1 = "hello world";
var s2 = "abcdefghijklmno"; 

var s3 = Regex.Replace(s1, "[^" + s2 + "]", "");

Console.WriteLine(s3); // helloold

If you want all alphabetic characters, you can set s2 to "a-z" or "A-Z" or "a-zA-Z".

dav_i
  • 27,509
  • 17
  • 104
  • 136