I have connected Entity Framework with a mysql database in an asp.net Webapi project. We have a table "recipes" and the goal is to select some random recipes. First we select some recipes which have certain ingredients, after that we take five random recipes. This was achieved with following extension method (only selecting random elements).
public static IEnumerable<Recipe> TakeRandom(this IQueryable<Recipe> recipes, int count, bool fastWay = true) {
if (count <= 0)
throw new Exception("Cant draw a negative number of recipes");
int left = count;
int tries = 0;
ISet<int> taken = new HashSet<int>();
Random r = new Random();
int C = 0;
IList<int> possibleIds = null;
if (!fastWay) {
possibleIds = recipes.Select(recipe => recipe.RecipeId).ToList();
C = possibleIds.Count;
} else
C = recipes.Max(recipe => recipe.RecipeId);
while (left > 0 && tries < count * count) {
int id = 0;
if (!fastWay)
id = possibleIds[r.Next(C)];
else
id = r.Next(C);
var item = recipes.FirstOrDefault(rr => rr.RecipeId == id);
if (item != null && !taken.Contains(item.RecipeId)) {
taken.Add(item.RecipeId);
yield return item;
left--;
}
tries++;
}
}
We are currently facing a performance issue: recipes have ingredients and tags each of which have their own table. It's a many to many relationship. Currently there are only ~300 recipes in the database, with around 800 unique ingredients. There are are over 5000 rows in the joined tables. It can take over 30 seconds (we use the free tier mysql database in azure since we're students) for the server to respond. But I don't think that the database is the problem because we query too much rows and the selection of random entities is not great. Notice that we do outputcaching, so only one user every hour has a very slow connection.
I have tried the following to optimize the performance.
- EF Code First: How to get random rows: Will not work because the function Guid.NewGuid() is not defined for Mysql. Also it is not performant because of the sort.
- Used newid() mysql function: It's clear to me that orderby is not a performant solution.
Is it possible to optimize the way I'm selecting these random elements after a where clause is specified?
Thanks in advance