5

Summing up the problem in a single line for clarity as suggested:

I want to find all the things that have different values for some field and are less than a week apart (based on another field)


Let's say I have a Users table and a User class.

Each User has the following fields:

  • SignUpDate a non-null DateTime field
  • UserType an int field, for this question let's say it's either 1 or 0

I'd like to select all the couples of users that signed up less than 7 days apart and have a different user type.

My current (miserable) attempt included an OrderBy(r=>r.SignUpDate) and a .ToList which wasn't a big deal since the number of users for every 2 weeks is not large. (I grabbed all the users from that with overlapping weeks and compared the data).

However, I find my current solution pretty poor. I have no idea what's the correct way to approach this.

I think the core problem here is that I don't know how to address the concept of 'selecting every two corresponding records' in LINQ to Entities after an ordering.

Any help much appreciated, I'm interested in how I would be able to solve this sort of problem without starting my own question in the future.

Example input

 SignUpDate        UserType
 ------------------------------------
 2008-11-11          1
 2008-11-12          0
 2008-11-13          0
 2008-12-13          0
 2008-12-15          1

Example output

Any meaningful way to indicate that the problematic pairs were:

 2008-11-11          1
 2008-11-12          0

(Different by a day and different type)

And

 2008-12-13          0
 2008-12-15          1

(Different by two days and different type)


Here is a related SQL solution I found.

Community
  • 1
  • 1
Benjamin Gruenbaum
  • 270,886
  • 87
  • 504
  • 504
  • Just to get you started, this might help: http://msdn.microsoft.com/en-us/library/dd487052.aspx – Russ Clarke Jun 09 '13 at 23:10
  • @RussC Thanks though LINQ to Entities seems to be able to do `DateTime.Subtract` , my problem is "Finding all the things that have a different value for some field and are less than a week apart". I'll clarify that. – Benjamin Gruenbaum Jun 09 '13 at 23:13

2 Answers2

3

I don't fully understand what problem you are trying to solve, so the following will be general recommendations only. It sounds like any two user sign-ups that are "adjacent in time" and within a week from each other is the rule, but it sounds a bit odd..

Whenever you want to query on a piece of information that is only indirectly available (i.e. something that is not simple a column value), use a projection to select the information you need to solve the problem.

var query = from user in context.Users
            let previousUser = context.Users
                .Where( u => u.SignUpDate < user.SignUpDate )
                .OrderBy( u => u.SignUpDate )
                .FirstOrDefault()
            select new
            {
                User = user,
                PreviousUser = previousUser,
                IsDuplicate = previousUser != null && previousUser.UserType != user.UserType,
                SignUpDaysApart = user.SignUpDate.Subtract( previousUser.SignUpDate )
            };

Once you have the data in a more accessible format, it becomes much easier to write the query to solve the business problem.

var duplicates = (from d in query
                 where d.IsDuplicate && d.SignUpDaysApart.TotalDays <= 7
                 select d).ToList();

I hope the above is inspiration enough for you to arrive at a solution.

Morten Mertner
  • 9,414
  • 4
  • 39
  • 56
  • Thanks, this looks like what I was looking for, is there any way to do the query syntax (the `let` part) in lambda expression syntax? (Maybe a `Select` in between?) Just wondering. – Benjamin Gruenbaum Jun 09 '13 at 23:40
  • Yes, the LINQ query syntax is just compiler sugar on top of the extension methods. Not sure what method you'd use though, sorry :) – Morten Mertner Jun 09 '13 at 23:42
  • FYI http://stackoverflow.com/questions/1092687/code-equivalent-to-the-let-keyword-in-chained-linq-extension-method-calls – Morten Mertner Jun 09 '13 at 23:43
0

Came up with sth like this

        private bool LessThan7DaysApart(DateTime d1, DateTime d2)
        {
            return (d1 - d2).Duration() < new TimeSpan(7, 0, 0, 0);
        }


        private void Match()
        {
            List<User> listfortype0 = users.Where(u => u.UserType == 0).ToList();
            List<User> listfortype1 = users.Where(u => u.UserType == 1).ToList();

            foreach (User u in listfortype0)
            {
                List<User> tmp = listfortype1.Where(u2 => LessThan7DaysApart(u2.SignUpDate, u2.SignUpDate)).ToList();
                if (tmp.Count > 0)
                {
                    List<User> matchedusers = new List<User> { u, tmp[0] };                    
                    listfortype1.Remove(tmp[0]);
                }
            }
        }
Peuczynski
  • 4,591
  • 1
  • 19
  • 33
  • Welcome to StackOverflow! Thanks for the attempt to help me with my question. [Please see this](http://stackoverflow.com/questions/252785/what-is-the-difference-between-iqueryablet-and-ienumerablet). When you perform a ToList() you're pulling _all_ the data from the database and converting it to objects. Querying those objects would be very expensive and not practical. Linq to Entities maps your code to SQL and it runs on the data base. This suggested solution is actually slower than the approach I'm currently using and not satisfied with. Thanks for the effort though. – Benjamin Gruenbaum Jun 09 '13 at 23:30
  • 1
    You don't need .Duration() to create a timespan object, that's the normal output of the - operator on DateTime. Also, instead of creating a new TimeSpan object of 7 days *every* time you evaluate (very expensive), just compare the .Days property of the TimeSpan that's created. With those two changes, you can just use `return (d1 - d2).Days < 7;` – Michael Blackburn Apr 11 '16 at 15:07