1

I'm trying to join two files, one at daily granularity and one at weekly granularity.

I currently have this:

var combinedTrends = from daily in _dailyDict
                     from weekly in _weeklyDict
                     where DateTime.Parse(daily.Key) >= DateTime.Parse(weekly.Key.Substring(0, 10)) && DateTime.Parse(daily.Key) <= DateTime.Parse(weekly.Key.Substring(13, 10))
                     select new GoogleTrends
                     {
                         Date = DateTime.Parse(daily.Key),
                         WeekStart = DateTime.Parse(weekly.Key.Substring(0, 10)),
                         WeekEnd = DateTime.Parse(weekly.Key.Substring(13, 10)),
                         DailyIndex = daily.Value,
                         WeeklyIndex = weekly.Value
                     };

Problem: I'm losing daily rows where the week does not exist in weekly file.

Input:

1/1/2014    91
1/2/2014    82
1/3/2014    89
1/4/2014    100
1/5/2014    87
1/6/2014    64
1/7/2014    64
1/8/2014    64
1/9/2014    67
etc....

2014-01-05 - 2014-01-11 37
2014-01-12 - 2014-01-18 34
2014-01-19 - 2014-01-25 34
etc.....

Expected Output

Date        WeekStart   WeekEnd     D   W
01-01-2014  (empty)     (empty)     91  (empty)
01-02-2014  (empty)     (empty)     82  (empty)
01-03-2014  (empty)     (empty)     89  (empty)
01-04-2014  (empty)     (empty)     100 (empty)
01-05-2014  01-05-2014  01-11-2014  87  37
01-06-2014  01-05-2014  01-11-2014  64  37
01-07-2014  01-05-2014  01-11-2014  64  37
01-08-2014  01-05-2014  01-11-2014  64  37
01-09-2014  01-05-2014  01-11-2014  67  37
etc....
Kyle
  • 5,407
  • 6
  • 32
  • 47
  • Why don't you check it between two values? I.e. `dateTime > weekBegin && dateTime < weekEnd` – Der Kommissar May 13 '15 at 15:03
  • Please add more information to your question: `include just enough code to allow others to reproduce the problem.` http://stackoverflow.com/help/how-to-ask Also, please include *current programme input and output* and *expected output*. – Der Kommissar May 13 '15 at 15:14
  • @EBrown More info in edit. – Kyle May 13 '15 at 15:24
  • You should use a `JOIN` in this case, which will allow you to select all `daily.Value` and the relevant `weekly` values. https://msdn.microsoft.com/en-us/library/bb311040.aspx You would want `LEFT OUTER JOIN`. Also, can you provide definitions for the `_dailyDict` and `_weeklyDict` variables? I may be able to come up with something for you. – Der Kommissar May 13 '15 at 15:33
  • @EBrown Both are Dictionary. In _dailyDict, string is then parsed to DateTime. In _weeklyDict, string has to be split to parse WeekStart and WeekEnd. – Kyle May 13 '15 at 15:37

2 Answers2

2

Alright, so this wasn't as easy as a simple JOIN, and could surely be made faster, but here's the solution I came up with:

Solution 1:

Calendar cal = DateTimeFormatInfo.CurrentInfo.Calendar;

var combinedTrends = from daily in _dailyDict
                        join weeklySub in _weeklyDict on cal.GetWeekOfYear(DateTime.Parse(daily.Key), CalendarWeekRule.FirstDay, DayOfWeek.Sunday) equals cal.GetWeekOfYear(DateTime.Parse(weeklySub.Key.Substring(0, 10)), CalendarWeekRule.FirstDay, DayOfWeek.Sunday) into weeklyGroup
                        from weekly in weeklyGroup.DefaultIfEmpty(new KeyValuePair<string, int>(DateTime.Parse(daily.Key).ToString("yyyy-MM-dd") + " - " + DateTime.Parse(daily.Key).ToString("yyyy-MM-dd"), 0))
                        select new GoogleTrends
                        {
                            Date = DateTime.Parse(daily.Key),
                            WeekStart = DateTime.Parse(weekly.Key.Substring(0, 10)),
                            WeekEnd = DateTime.Parse(weekly.Key.Substring(13, 10)),
                            DailyIndex = daily.Value,
                            WeeklyIndex = weekly.Value
                        };

Essentially, we take your original query, and add the following to it:

join weeklySub in _weeklyDict on cal.GetWeekOfYear(DateTime.Parse(daily.Key), CalendarWeekRule.FirstDay, DayOfWeek.Sunday) equals cal.GetWeekOfYear(DateTime.Parse(weeklySub.Key.Substring(0, 10)), CalendarWeekRule.FirstDay, DayOfWeek.Sunday) into weeklyGroup
from weekly in weeklyGroup.DefaultIfEmpty(new KeyValuePair<string, int>(DateTime.Parse(daily.Key).ToString("yyyy-MM-dd") + " - " + DateTime.Parse(daily.Key).ToString("yyyy-MM-dd"), 0))

What this does, is tells LINQ to check that the week number of the daily item matches the week number of the weekly item, and if so return it. Otherwise it returns the same date as the daily record as the start and end of the week, with a 0 value.

Solution 2:

Now, we could add helper methods which would get the first and last day of the week, to send a right-proper string to the weekly item with, and modify the query for it:

// We use this function because .NET does not implicitly consider the first and last weeks of the year the same.
public static int GetWeekOfYear(int weekNumber)
{
    if (weekNumber > 52)
        return 1;

    return weekNumber;
}

// This will get the week-range in a string of format "yyyy-MM-dd - yyyy-MM-dd".
public static string GetWeekRange(DateTime date, DayOfWeek firstDayOfWeek)
{
    int numberOfDayOfWeek = DayOfWeekToNumber(date.DayOfWeek, firstDayOfWeek);

    return date.AddDays(0 - numberOfDayOfWeek).ToString("yyyy-MM-dd") + " - " + date.AddDays(6 - numberOfDayOfWeek).ToString("yyyy-MM-dd");
}

// This converts the DayOfWeek to a number to allow us to easily determine how many days to add/subtract for the first/last days of the week.
public static int DayOfWeekToNumber(DayOfWeek dayOfWeek, DayOfWeek firstDayOfWeek)
{
    int[] r = new int[2];
    r[0] = 0;
    r[1] = 0;

    r[0] = DayOfWeekValue(firstDayOfWeek);
    r[1] = DayOfWeekValue(dayOfWeek);

    int result = r[1] - r[0];
    if (result < 0)
        result += 7;

    return result;
}

// This converts the DayOfWeek enum into a numerical value, Sunday being 0 and up from there.
public static int DayOfWeekValue(DayOfWeek dayOfWeek)
{
    switch (dayOfWeek)
    {
        case DayOfWeek.Sunday:
            return 0;
        case DayOfWeek.Monday:
            return 1;
        case DayOfWeek.Tuesday:
            return 2;
        case DayOfWeek.Wednesday:
            return 3;
        case DayOfWeek.Thursday:
            return 4;
        case DayOfWeek.Friday:
            return 5;
        case DayOfWeek.Saturday:
            return 6;
    }

    return -1;
}

Then we complete our Query:

Calendar cal = DateTimeFormatInfo.CurrentInfo.Calendar;

DayOfWeek firstDayOfWeek = DayOfWeek.Sunday;
var combinedTrends = from daily in _dailyDict
                        join weeklySub in _weeklyDict on GetWeekOfYear(cal.GetWeekOfYear(DateTime.Parse(daily.Key), CalendarWeekRule.FirstDay, firstDayOfWeek)) equals GetWeekOfYear(cal.GetWeekOfYear(DateTime.Parse(weeklySub.Key.Substring(0, 10)), CalendarWeekRule.FirstDay, firstDayOfWeek)) into weeklyGroup
                        from weekly in weeklyGroup.DefaultIfEmpty(new KeyValuePair<string, int>(GetWeekRange(DateTime.Parse(daily.Key), firstDayOfWeek), 0))
                        select new GoogleTrends
                        {
                            Date = DateTime.Parse(daily.Key),
                            WeekStart = DateTime.Parse(weekly.Key.Substring(0, 10)),
                            WeekEnd = DateTime.Parse(weekly.Key.Substring(13, 10)),
                            DailyIndex = daily.Value,
                            WeeklyIndex = weekly.Value
                        };

Personally, it is up to you as which method you should take, I think that the latter I mentioned would probably be best.

The output, of this solution by the way, is:

Date       WeekStart  WeekEnd    D   W
01-01-2014 12-29-2013 01-04-2014 91  0
01-02-2014 12-29-2013 01-04-2014 82  0
01-03-2014 12-29-2013 01-04-2014 89  0
01-04-2014 12-29-2013 01-04-2014 100 0
01-05-2014 01-05-2014 01-11-2014 87  37
01-06-2014 01-05-2014 01-11-2014 64  37
01-07-2014 01-05-2014 01-11-2014 64  37
01-08-2014 01-05-2014 01-11-2014 64  37
01-09-2014 01-05-2014 01-11-2014 67  37

Note: As indicated in the comments, the GetWeekOfYear(int) function exists because .NET will start a partial week as the first (1) week of the year, but will not end a partial week as the first, but instead as week 53. This function returns 1 in that case, to allow for the two partial weeks to be joined properly.

Also, this was actually a fun problem to solve.

Test Environment:

The following code and comments were used as indicated to test these solutions:

Dictionary<string, int> _dailyDict = new Dictionary<string, int>();

_dailyDict.Add("12/31/2013", 91);
_dailyDict.Add("1/1/2014", 91);
_dailyDict.Add("1/2/2014", 82);
_dailyDict.Add("1/3/2014", 89);
_dailyDict.Add("1/4/2014", 100);
_dailyDict.Add("1/5/2014", 87);
_dailyDict.Add("1/6/2014", 64);
_dailyDict.Add("1/7/2014", 64);
_dailyDict.Add("1/8/2014", 64);
_dailyDict.Add("1/9/2014", 67);

Dictionary<string, int> _weeklyDict = new Dictionary<string, int>();

// This line was removed and added to ensure that weeks that didn't exist would still return daily's.
//_weeklyDict.Add("2013-12-29 - 2014-01-04", 1);
_weeklyDict.Add("2014-01-05 - 2014-01-11", 37);
_weeklyDict.Add("2014-01-12 - 2014-01-18", 34);
_weeklyDict.Add("2014-01-19 - 2014-01-25", 34);

The following code was used to generate output:

foreach (var trend in combinedTrends)
{
    Console.WriteLine(trend.Date.ToString("MM-dd-yyyy") + "\t" + trend.WeekStart.ToString("MM-dd-yyyy") + "\t" + trend.WeekEnd.ToString("MM-dd-yyyy") + "\t" + trend.DailyIndex.ToString() + "\t" + trend.WeeklyIndex.ToString());
}

References:

https://msdn.microsoft.com/en-us/library/bb311040.aspx

Get the correct week number of a given date

https://msdn.microsoft.com/en-us/library/system.globalization.calendar.getweekofyear.aspx

Community
  • 1
  • 1
Der Kommissar
  • 5,848
  • 1
  • 29
  • 43
-2

Try using a join

var combinedTrends = from daily in _dailyDict
                                 join weekly in _weeklyDict
                                 on DateTime.Parse(daily.Key) equals DateTime.Parse(weekly.Key.Substring(0, 10))
                                 select new GoogleTrends
                                 {
                                     Date = DateTime.Parse(daily.Key),
                                     WeekStart = DateTime.Parse(weekly.Key.Substring(0, 10)),
                                     WeekEnd = DateTime.Parse(weekly.Key.Substring(13, 10)),
                                     DailyIndex = daily.Value,
                                     WeeklyIndex = weekly.Value
                                 };
jdweng
  • 33,250
  • 2
  • 15
  • 20