0

I have a code that searches people in MongoDB by name. The problem with my code is that its not scaling up, I tried to run this code with 40 threads but when I wacthed the reaction in my DB, I had to close it because it was taking 100% of the CPU. All three fields being searched are indexed in MongoDB. Is there a way to improve the query? I need to run 5 million names and my DB has over 200 million people.

public Person searchPerson(string name)
{
    string MongoUser = "MONGO_USERNAME";
    string MongoPass = "MONGO_PASSWORD";
    string MongoURL  = "MONGO_URL";
    string MongoDB   = "MONGO_DB"; 

    MongoClient _client = new MongoClient("mongodb://" + MongoUser + ":" + MongoPass + "@" + MongoURL);

    IMongoCollection<BazingaPerson> myCollection   = _client.GetDatabase (MongoDB).GetCollection<Person>         ("COLLECTION_NAME");

    List<Person> peopleList = myCollection.AsQueryable<Person>().Where(e => e.Name == name).ToList<Person>();

    // both functions below only transform string like replace, substring or splits . They dont query in a DB or make web requests
    string nameInitials = getInitials(name); 
    string phoneticName = getPhoneticName(name);

    if(peopleList.Count() == 0) peopleList = myCollection.AsQueryable<Person>().Where(e => e.StandardName.Equals (phoneticName)).ToList<Person>();
    if(peopleList.Count() == 0) peopleList = myCollection.AsQueryable<Person>().Where(e => e.Initials.Equals (nameInitials)).ToList<Person>();

    if(peopleList.Count() == 0) return null;

    return peopleList[0];
}
Carlos Siestrup
  • 1,031
  • 2
  • 13
  • 33

2 Answers2

2

I need to run 5 million names and my DB has over 200 million people.

Insert your 5 million records into a temporary table and then run your single query to get the result set. Let the database wonder how to best solve this problem. It's their job and most are quite good at it.

Just compare it to an actual on-paper list of people. You want to give this list to the guy doing the lookup in one piece and let him work on it undisturbed until he's done. What you don't want to do is hire another 40 people that knock on this poor guys door with a single name every few seconds to tell him to look up that single name right now.

nvoigt
  • 75,013
  • 26
  • 93
  • 142
  • I just noticed it's mongoDB and not a traditional database, so my wording might be a bit off. But generally speaking the same philosophy should apply. [MongoDB JOIN](https://stackoverflow.com/questions/2350495/how-do-i-perform-the-sql-join-equivalent-in-mongodb) – nvoigt Jul 27 '18 at 12:28
0

Is the MongoDB process going up to 100% CPU or your application?

If it's the application, here's a couple of hints:

  1. Don't create new MongoDB clients for every query. Keep them in a (thread-local) instance variable.

  2. Methods that hit the network (like your searchPerson method) should preferably be implemented async. ``` public async Task searchPerson(string name) { // ...

    List<Person> peopleList = await Task.Run(() => myCollection.AsQueryable<Person>().Where(e => e.Name == name).ToList<Person>());
    
    // ...
    

    } ```

(maybe you can skip the Task.Run and use an async version of the MongoDB client).

Freek Wiekmeijer
  • 4,556
  • 30
  • 37
  • This is the code from the API the program with 40 threads was consuming the API, it's a test on how many simultaneous requests I could make with my API. But I already found out the problem, there were too many results in initials so I just limited the result. Thanks for the help. – Carlos Siestrup Jul 27 '18 at 17:44