Using LINQ remove vowels from string

Question

I want remove vowels from string array i did it with foreach loops but now want to perform it with using LINQ or Lambda expression

I have tried the following code LINQ

string[] strArray = new string[] { "cello", "guitar", "violin"};
string[] vowels = new string[] { "a", "e", "i", "o", "u" };

var vNovowels = from vitem in strArray
                from vowel in vowels
                where vitem.Contains(vowel)
                select vitem.Replace(vowel, "");

foreach (var item in vNovowels)
{
    Console.WriteLine(item); 
}

But i am not getting what is expected .

Output i am getting with above query is :-

cllo
cell
guitr
gutar
gitar
voln
vilin

Desired output :

cll
gtr
vln

@OndrejJanacek It is a query. He's not trying to modify the existing collection but create a new one based on values from the source collection. If that wasn't ok then there would be no point of having a projection operation like `select`. — Dirk, Feb 20 '14 at 09:15

score 13 · Answer 1 · edited May 23 '17 at 11:43

13

You can accomplish this very efficiently using regular expressions to match all vowels and replace them with empty strings:

var strArray = new List<string> { "cello", "guitar", "violin" };
var pattern = @"[aeiou]";
var noVowels = strArray.Select(item => 
                  Regex.Replace(item, pattern, "", RegexOptions.IgnoreCase));
foreach (var item in noVowels) {         
    Console.WriteLine(item); 
}

This returns the outputs that you are looking for.

Your original attempt did not work because it evaluated each word separately for every unique vowel that it contained.

Update: I did some basic benchmarking of this solution versus Mathias' HashSet<char> based solution (benchmark code here), including both Compile and Noncompiled versions of the Regex version. I ran it against an array of 2582 lorem-ipsum words, iterating 10 million times against the set (so going at ~25 billion words), running it in LinqPad, taking the average of 3 runs:

                  Init Each Time              Init One Time
                avg ms      % diff          avg ms     % diff
Regex            586          +1%            586          -
Regex Compiled   581          -              593         +1%
HashSet         2550        +339%            641        +10%

It turns out that if you only initialize the HashSet and pattern string one time, then they have very similar performance. Regex beats out Hashset, but only barely (80 ms faster over 25 billion words) and Regex Compiled and Noncompiled perform almost identically. However, if you initialize the HashSet every single time you run it, then it kills performance for the HashSet approach.

The takeaway is that if you want to use the HashSet approach, be sure to initialize your HashSet only once per set of chars that you want to exclude.

edited May 23 '17 at 11:43

Community

1
1

answered Feb 20 '14 at 09:09

Yaakov Ellis

40,752
27
129
174

**Why** trying to replace if vowel doesn't exists ? – Royi Namir Feb 20 '14 at 09:20
@RoyiNamir Why explicitely check if the vowel exists if Regex.Replace does that anyway? – Dirk Feb 20 '14 at 09:26
@RoyiNamir, he's iterating over each word rather than each letter. – Sam Feb 20 '14 at 09:27
2

@RoyiNamir I think that you might be mistaken about the way that regular expressions work. It goes once through the string marking all of the pattern matches, and then does the replacement all at once (building the next string in a StringBuilder). Compare this to `String.Replace` which is allocating new strings left and right. – Yaakov Ellis Feb 20 '14 at 09:27
@YaakovEllis NO , I'm saying : why to use regex replace over the word "bbb" if it doesnt contains any vowles ....? ( in short - you're missing a Where clause) – Royi Namir Feb 20 '14 at 09:29
1

@RoyiNamir Again, look at the code for `Regex.Replace`. It first matches the pattern (similar to `string.Contains`), then runs the `Replace` function. The `Replace` function doesn't do anything if there are no matches. – Yaakov Ellis Feb 20 '14 at 09:32
@YaakovEllis Ok got your point. I just prefer not to use regex matching to pattern , where I can do otherwise. My idea was to filter the relevant words , before doing the regex match – Royi Namir Feb 20 '14 at 09:34
+1 for using Regex - the best solution here. Using the `RegexOptions.Compiled` (and storing the Regex reference of course) would make it even better. – Matt Tester Feb 24 '14 at 03:06
@MattTester I just added RegexCompiled to the benchmarking. Surprisingly it didn't make that much of a difference (though I am sure that [in some scenarios](http://stackoverflow.com/a/7707369/51) it would be a much more significant factor) – Yaakov Ellis Feb 24 '14 at 06:24
@Yaakov Thanks for trying it. When the Regex can be reused (like in a loop or over the running time of an app) it will have more of an impact. – Matt Tester Feb 24 '14 at 06:29

score 8 · Accepted Answer · edited May 23 '17 at 12:29

8

Although Yaakov's reg-ex solution is much better in terms of elegancy and efficiency, you can use Where for the sake of learning:

string[] strArray = new string[] { "cello", "guitar", "violin" };
var vowels = new HashSet<char>("aeiou"); // or: { 'a', 'e', 'i', 'o', 'u' };

var vNovowels2 = from vitem in strArray
                 select new string(vitem.Where(c => !vowels.Contains(c)).ToArray());

foreach (var item in vNovowels2)
{
    Console.WriteLine(item);
}

edited May 23 '17 at 12:29

Community

1
1

answered Feb 20 '14 at 09:07

Matthias Meid

12,455
7
45
79

I would have thought the `HashSet` would be more efficient than a regular expression. I think yours is more elegant, too! – Sam Feb 20 '14 at 09:24
1

@Sam I appreciate. To check speed I quickly (and primitively) benchmarked mine and Yaakov's by running it a million times each with a stopwatch, and it turned out the reg-ex approach takes approximately 38% of the time mine does... – Matthias Meid Feb 20 '14 at 09:44
1

FYI: I added some benchmarks to [my answer](http://stackoverflow.com/a/21903029/51) – Yaakov Ellis Feb 20 '14 at 10:18
1

Or: `var vowels = new HashSet("aeiou");` – nmclean Feb 20 '14 at 15:52

score 1 · Answer 3 · answered Feb 20 '14 at 09:18

Regex Replace is best way to do this.

string[] strArray = new string[] { "cello", "guitar", "violin" };

var rx = new Regex("^a|e|i|o|u", RegexOptions.IgnoreCase);

var vNovowels = from vitem in strArray
                select rx.Replace(vitem, string.Empty);

foreach (var item in vNovowels)
{
    Console.WriteLine(item);
}

Using LINQ remove vowels from string

3 Answers3