2

I have been trying to find most frequent words from a list of strings. I have tried something like Find the most occurring number in a List<int>

but issue is that it returns only one word, but all those words are required which are most frequent.

For example, if we call that LINQ query on following list:

Dubai
Karachi
Lahore
Madrid
Dubai
Sydney
Sharjah
Lahore
Cairo

it should result us in:

ans: Dubai, Lahore

Community
  • 1
  • 1
Failed Scientist
  • 1,977
  • 3
  • 29
  • 48

5 Answers5

4

Use a group by and then order by count:

var result = list
  .GroupBy(s => s)
  .Where(g=>g.Count()>1)
  .OrderByDescending(g => g.Count())
  .Select(g => g.Key);
ocuenca
  • 38,548
  • 11
  • 89
  • 102
2

If you need all words which are occurring repeatedly ..

  List<string> list = new List<string>();
            list.Add("A");
            list.Add("A");
            list.Add("B");
            var most = (from i in list
                        group i by i into grp
                        orderby grp.Count() descending
                        select new { grp.Key, Cnt = grp.Count() }).Where (r=>r.Cnt>1);
Abdul Rehman Sayed
  • 6,532
  • 7
  • 45
  • 74
1

If you want to get several most frequent words, you can use this method:

public List<string> GetMostFrequentWords(List<string> list)
{
    var groups = list.GroupBy(x => x).Select(x => new { word = x.Key, Count = x.Count() }).OrderByDescending(x => x.Count);
    if (!groups.Any()) return new List<string>();

    var maxCount = groups.First().Count;

    return groups.Where(x => x.Count == maxCount).Select(x => x.word).OrderBy(x => x).ToList();
}

[TestMethod]
public void Test()
{
    var list = @"Dubai,Karachi,Lahore,Madrid,Dubai,Sydney,Sharjah,Lahore,Cairo".Split(',').ToList();
    var result = GetMostFrequentWords(list);

    Assert.AreEqual(2, result.Count);
    Assert.AreEqual("Dubai", result[0]);
    Assert.AreEqual("Lahore", result[1]);
}
Alex Vazhev
  • 1,363
  • 1
  • 18
  • 17
1

In case you want Dubai, Lahore only (i.e. only words with top occurrence, which is 2 in the sample):

  List<String> list = new List<String>() {
   "Dubai", "Karachi", "Lahore", "Madrid", "Dubai", "Sydney", "Sharjah", "Lahore", "Cairo"
   };

  int count = -1;

  var result = list
    .GroupBy(s => s, s => 1)
    .Select(chunk => new {
      name = chunk.Key,
      count = chunk.Count()
     })
    .OrderByDescending(item => item.count)
    .ThenBy(item => item.name)
    .Where(item => {
      if (count < 0) {
        count = item.count; // side effects, alas (we don't know count a-priory)

        return true;
      }
      else
        return item.count == count;
    })
    .Select(item => item.name);

Test:

  // ans: Dubai, Lahore
  Console.Write("ans: " + String.Join(", ", result));
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
1

I'm sure there must be better way, but one thing I manage to make (which may help you to make it more optimised) is something like follow

List<string> list = new List<string>();
        list.Add("Dubai");
        list.Add("Sarjah");
        list.Add("Dubai");
        list.Add("Lahor");
        list.Add("Dubai");
        list.Add("Sarjah");
        list.Add("Sarjah");


        int most = list.GroupBy(i => i).OrderByDescending(grp => grp.Count())
            .Select(grp => grp.Count()).First();
        IEnumerable<string> mostVal = list.GroupBy(i => i).OrderByDescending(grp => grp.Count())
            .Where(grp => grp.Count() >= most)
            .Select(grp => grp.Key) ;

this will list of those who are occurring most frequent, if two entries are occurring frequency is same, they both will be included.

NOTE we are not selecting entries having frequency more than once.

Amit
  • 1,821
  • 1
  • 17
  • 30