Find Most Frequent Words using LINQ

Question

I have been trying to find most frequent words from a list of strings. I have tried something like Find the most occurring number in a List<int>

but issue is that it returns only one word, but all those words are required which are most frequent.

For example, if we call that LINQ query on following list:

Dubai
Karachi
Lahore
Madrid
Dubai
Sydney
Sharjah
Lahore
Cairo

it should result us in:

ans: Dubai, Lahore

where is the code that you have written to attempt to solve hte problem? — Daniel A. White, May 16 '16 at 13:02

ocuenca · Accepted Answer · 2016-05-16T13:01:26.500

4

Use a group by and then order by count:

var result = list
  .GroupBy(s => s)
  .Where(g=>g.Count()>1)
  .OrderByDescending(g => g.Count())
  .Select(g => g.Key);

edited May 16 '16 at 13:01

answered May 16 '16 at 12:55

ocuenca

38,548
11
89
102

Just an unrelated question. Can we apply restriction of selecting only those which exist more than once? – Failed Scientist May 16 '16 at 13:00

score 2 · Answer 2 · answered May 16 '16 at 13:00

If you need all words which are occurring repeatedly ..

  List<string> list = new List<string>();
            list.Add("A");
            list.Add("A");
            list.Add("B");
            var most = (from i in list
                        group i by i into grp
                        orderby grp.Count() descending
                        select new { grp.Key, Cnt = grp.Count() }).Where (r=>r.Cnt>1);

score 1 · Answer 3 · answered May 16 '16 at 13:05

If you want to get several most frequent words, you can use this method:

public List<string> GetMostFrequentWords(List<string> list)
{
    var groups = list.GroupBy(x => x).Select(x => new { word = x.Key, Count = x.Count() }).OrderByDescending(x => x.Count);
    if (!groups.Any()) return new List<string>();

    var maxCount = groups.First().Count;

    return groups.Where(x => x.Count == maxCount).Select(x => x.word).OrderBy(x => x).ToList();
}

[TestMethod]
public void Test()
{
    var list = @"Dubai,Karachi,Lahore,Madrid,Dubai,Sydney,Sharjah,Lahore,Cairo".Split(',').ToList();
    var result = GetMostFrequentWords(list);

    Assert.AreEqual(2, result.Count);
    Assert.AreEqual("Dubai", result[0]);
    Assert.AreEqual("Lahore", result[1]);
}

score 1 · Answer 4 · answered May 16 '16 at 13:06

In case you want Dubai, Lahore only (i.e. only words with top occurrence, which is 2 in the sample):

  List<String> list = new List<String>() {
   "Dubai", "Karachi", "Lahore", "Madrid", "Dubai", "Sydney", "Sharjah", "Lahore", "Cairo"
   };

  int count = -1;

  var result = list
    .GroupBy(s => s, s => 1)
    .Select(chunk => new {
      name = chunk.Key,
      count = chunk.Count()
     })
    .OrderByDescending(item => item.count)
    .ThenBy(item => item.name)
    .Where(item => {
      if (count < 0) {
        count = item.count; // side effects, alas (we don't know count a-priory)

        return true;
      }
      else
        return item.count == count;
    })
    .Select(item => item.name);

Test:

  // ans: Dubai, Lahore
  Console.Write("ans: " + String.Join(", ", result));

Amit · Answer 5 · 2016-05-16T13:24:17.973

I'm sure there must be better way, but one thing I manage to make (which may help you to make it more optimised) is something like follow

List<string> list = new List<string>();
        list.Add("Dubai");
        list.Add("Sarjah");
        list.Add("Dubai");
        list.Add("Lahor");
        list.Add("Dubai");
        list.Add("Sarjah");
        list.Add("Sarjah");


        int most = list.GroupBy(i => i).OrderByDescending(grp => grp.Count())
            .Select(grp => grp.Count()).First();
        IEnumerable<string> mostVal = list.GroupBy(i => i).OrderByDescending(grp => grp.Count())
            .Where(grp => grp.Count() >= most)
            .Select(grp => grp.Key) ;

this will list of those who are occurring most frequent, if two entries are occurring frequency is same, they both will be included.

NOTE we are not selecting entries having frequency more than once.

Find Most Frequent Words using LINQ

5 Answers5