2

How can I replicate the behavior of this SQL query in ABP framework? The SQL query is:

SELECT * FROM App_Profile
ORDER BY NEWID()

I tried using the following LINQ expression but it did not work - I want to get random records each time, but I got always the same results instead:

query.Skip(input.SkipCount).Take(input.MaxResultCount).OrderBy(x => Guid.NewGuid());
var profiles= await _asyncExecuter.ToListAsync(query);
  • Concrete example: I have a dataset of 100 questions, and an endpoint that returns 30 questions, each with a collection of potential responses. My goal is to ensure that each time the user invokes the endpoint, it returns a different set of questions with their potential responses, regardless of whether any questions happen to exist in both the first and second results.

P.S : I'm using ABP framework, EF Core 6

  • After trying I got to this solution It seems the problem lies with await _asyncExecuter.ToListAsync(query) and I'm not sure why. I've tried using var questions = query.OrderBy(x => Guid.NewGuid()).Take(input.MaxResultCount).ToList(); instead.
byteram
  • 125
  • 8
  • @Tatranskymedved I want to get random records each time from the database I call a specific method, but I got the same result each time – byteram Apr 23 '23 at 18:02
  • 1
    `ORDER BY NEWID()` in SQL means [tag:shuffle] that's the keyword you want to look for. You don't want to literally translate it to C#. – Thomas Apr 23 '23 at 19:01
  • @Thomas, no. That requires the data to be loaded. I want to create a LINQ query that will be translated into the SQL query mentioned above. – byteram Apr 23 '23 at 20:26
  • You show a simple `SELECT *`. You reproduce it by a query containing Skip/Take. That's a major difference. It's not clear if the records should be "shuffled" before or after Skip/Take. Also, please tell which database provider this is about and show the generated SQL. – Gert Arnold Apr 23 '23 at 20:55
  • @GertArnold I'm actually trying to shuffle the results before using Skip/Take because I want to get a different result each time, instead of taking them in the same order as the database. For the database provider, I'm using SQL Server. – byteram Apr 23 '23 at 21:38

3 Answers3

2

Tatranskymedved has answered your question, but to expand on the elements you appear to be confused about:

Use the following to only load the selected items, not everything into memory first:

var profiles = await query.OrderBy(x => Guid.NewGuid())
    .Skip(input.SkipCount)
    .Take(input.MaxResultCount)
    .ToListAsync();

Now the real question is what are you trying to accomplish here? Randomly ordering the entire set of rows then using skip and take as a DB query doesn't make any sense. Sure, this will take a page of data from a random set, but if your goal is to load pages of data that has been randomized, this won't work as you probably expect as each query will re-randomize the set so you can, and will get back the same item in multiple paged calls. The ordering needs to be persisted between paging calls. If you just want 100 random items out of an entire set, you don't need Skip, just use Take to get the first 100 randomized items.

I don't know what _asyncExecutor is, but I'm pretty certain it isn't necessary unless it is a wrapper to log output or such, but I'd suspect it is meant to do something like wrap a synchronous operation to be treated as an asynchronous one. (Not needed as EF supports async operations)

To explain what you are seeing with your examples:

query.Skip(input.SkipCount)
    .Take(input.MaxResultCount)
    .OrderBy(x => Guid.NewGuid());

var profiles= await query.ToListAsync();

Given a Skip value of 100 and a MaxResultCount of 10 then this would always take rows 101-110 then randomly order those 10 results rather than randomly order the whole set. The additional trouble with this approach is that the assumed default order that the data will be read in is not reliable and will vary as data is added/removed from the set. It will start by looking like the default order will be the ID or order the rows were added, but as the set grows and changes, that will not be repeatably reliable.

Concrete example: I have a dataset of 100 questions and an endpoint that returns 30 questions. What I want to do is to ensure that each time the user invokes the endpoint, it will return a different set of questions, regardless of whether there are any questions that happen to exist in both the first and second results.

If you want to select 30 random questions from a data set of 100 questions and you don't care if questions can be repeated between calls:

var profiles = await query.OrderBy(x => Guid.NewGuid())
    .Take(input.MaxResultCount)
    .ToListAsync();

This is all you need.

If you want to ensure that the next 30 questions cannot include questions the user has already tried, then the best way to ensure this is to cache the question IDs that you have already selected, and exclude that from the set:

Initial state:

List<int> questionIdsAsked = (List<int>)Session[nameof(questionIdsAsked)] ?? new List<int>();

if(questionIdsAsked.Any())
    query = query.Where(x => !questionIdsAsked.Contains(x.Id));

var questions = await query.OrderBy(x => Guid.NewGuid())
    .Take(input.MaxResultCount)
    .ToListAsync();

questionIdsAsked.AddRange(questions.Select(x => x.Id));
Session[nameof(questionIdsAsked)] = questionIdsAsked;

This assumes a web application, but if an application the questionIdsAsked can just be a private member that can be cleared if necessary. This checks to see if the current user has been given a set of questions. On the first run for the session we get the first 30 questions from the database and record those question IDs to the session state. This way when we call it again we get the 30 question IDs from the previous run and exclude those from the query before re-randomizing and taking 30 new questions. Obviously you will need to handle the scenario where you might run out of questions or enough for getting a full set of 30 if you take this approach.

Edit: Note that if you are eager loading responses or other data, you cannot use AsSplitQuery with this method. This would result in a random set of Responses to questions being read so some questions might get their answers loaded while others do not. With Split Queries EF would generate two queries which would look something like:

SELECT TOP 10 * FROM Questions
ORDER BY NEWID()

SELECT r.* FROM Responses r
INNER JOIN Questions q ON r.QuestionId = q.Id
WHERE q.Id IN (SELECT TOP 10 Id FROM Questions 
ORDER BY NEWID())

The first query would fetch 10 random questions, but for split queries to work reliably the sorting has to be identical, which in this case it is not. It will load responses for a different 10 random questions.

You either need to accept the cost of the Cartesian Product of the questions and their responses, or you could consider performing the Split query manually:

// Note: No eager load.
var questions = await _context.Questions
    .OrderBy(x => Guid.NewGuid())
    .Skip(input.SkipCount)
    .Take(input.MaxResultCount)
    .ToListAsync();
var questionIds = questions.Select(x => x.Id).ToList();
var responses = await _context.Responses
    .Where(x => questionIds.Contains(x.QuestionId))
    .ToListAsync();

Better would be to project the question and the response details you need into a DTO to reduce the query Cartesian as much as possible so that you don't have to resort to Include and AsSplitQuery.

Steve Py
  • 26,149
  • 3
  • 25
  • 43
  • Thanks. It seems the problem lies with await _asyncExecuter.ToListAsync(query) and I'm not sure why. I've tried using var questions = query.OrderBy(x => Guid.NewGuid()).Take(input.MaxResultCount).ToList(); instead, but now I'm facing another issue where a collection of responses with the question is not being eagerly loaded properly. Sometimes the first question is loaded without its corresponding response. – byteram Apr 24 '23 at 11:41
  • If you want the responses you need to eager load them with `Include(x => x.Responses)`. What you are probably seeing is that if you don't eager load with `include` EF will still populate references it happens to be tracking. This will result in inconsistent and incomplete data appearing available if lazy loading is disabled. EF will show whatever happens to be tracked at that stage from previous reads. – Steve Py Apr 24 '23 at 12:07
  • I already have ```include(x => x.Responses)``` in my code. At first, everything seems to load properly, but after executing the code multiple times, the first question is sometimes loaded without its responses. – byteram Apr 24 '23 at 12:19
  • If you are building a query expression up, make sure you haven't forgotten an assignment. I.e. `if (includeResponses) query = query.Include(x => x.Responses);` vs. accidentally just writing: `if (includeResponses) query.Include(x => x.Responses);` This is a common "gotcha" that can be easy to miss. – Steve Py Apr 25 '23 at 01:58
  • No, it's related to ```OrderBy(x => Guid.NewGuid())``` because when I remove it, I get the questions with their responses every time. However, when I add it, sometimes the responses are missing. and this is my query ```questions.Include(x => x.Responses).OrderBy(x => Guid.NewGuid()).Take(10).ToListAsync()``` – byteram Apr 25 '23 at 11:39
  • That is very odd. Run your query with the OrderBy and capture the IDs of the 10 questions it picks where some of them are missing their responses, then load those same 10 items without the OrderBy using a `Where` clause. `questions.Include(x=> x.Responses).Where(x => questionIds.Contains(x.Id)).ToList()` where the questionIds is an array containing the 10 random question IDs having some missing responses. Verify that the responses are loaded without the order by. I suspect you may have a bug in your entity config or there is a bug in EF's handling of that ordering. – Steve Py Apr 25 '23 at 21:07
  • Also, are you using `AsSplitQuery`? Using a randomized ordering I suspect AsSplitQuery will not work in this case as it would most likely be picking random responses alongside random questions due to the nature of how the split query generation works. Ideally you should consider projection to reduce Cartesian costs if loading entity graphs for sets of results. – Steve Py Apr 25 '23 at 21:12
1

The issue with that is, you first select the items (always same set) and then order them randomly.

Change the order of Linq calls and invoke the OrderBy() just before you start taking the elements:

query.OrderBy(x => Guid.NewGuid())
     .Skip(input.SkipCount)
     .Take(input.MaxResultCount);
Tatranskymedved
  • 4,194
  • 3
  • 21
  • 47
  • I did it but nothing change I got the same result every time at the same order!! – byteram Apr 23 '23 at 18:13
  • How big is your set and how many objects are you skipping/taking? – Tatranskymedved Apr 23 '23 at 18:15
  • How big is your set ? 1000, how many objects are you skipping/taking? skipping (100) taking (50) – byteram Apr 23 '23 at 18:16
  • I got a random result but the problem I want to get them from the database ready not load them then order them – byteram Apr 23 '23 at 18:26
  • 1
    But you are asking for something else in the question - you asked how to get random results everytime.. now you are putting more conditions and asking for different approach -> you should either put all details in the question at the start OR raise a new question for the topic. – Tatranskymedved Apr 23 '23 at 18:29
1

I cannot reproduce the problem (that is, I get result from the database). I am using EF Core 7 and SQL Server. Specifically, I have the following code:

IQueryable<Agency> agencies = _db.Agency.OrderBy(a => Guid.NewGuid()).Skip(3).Take(2); 
Console.WriteLine(agencies.ToQueryString());
foreach (Agency agency in agencies.ToList())
{
    Console.WriteLine($"{agency.AgencyId} - {agency.AgencyName}");
}

I see the following query that gets generated:

DECLARE @__p_0 int = 3;
DECLARE @__p_1 int = 2;

SELECT [a].[agencyId], [a].[agencyName]
FROM [Agency] AS [a]
ORDER BY NEWID()
OFFSET @__p_0 ROWS FETCH NEXT @__p_1 ROWS ONLY

and I get random 2 rows from the database. Although, since the rows are random, I am not sure what is the need for Skip()

Felix
  • 9,248
  • 10
  • 57
  • 89