2

I'm currently in the middle of developing a reporting application with SQL Server on the back-end.

Data is ingested from csv files into tables. One of the files (with financial data) contains a financial period in the format of MM.YYYY (04.2015, 07.2016, etc.).

Later on people will want to query that data based on that period, for example anything between 04.2015 and 03.2016. Obviously running a between on strings will give you either no result or incorrect data.

What is the best way to handle this?

  • at the time of import convert to date (keeping in mind that datetime data type requires day)

  • cast/convert the string to date in the WHERE clause (how would you handle MM.YYYY in that case)

any other solutions?

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
  • 2
    I'd convert to date at the time of import. – Gurwinder Singh Jan 27 '17 at 10:18
  • so I pere ur question csv will have monthly data right ? so we can add day to this feild based on the month lets say last day of that month ? and then later in query we can use convert – Yashveer Singh Jan 27 '17 at 10:19
  • How are you importing data from csv to SQL ? – Chetan Jan 27 '17 at 10:24
  • That's not a date. You can't have a date without a day. The best option is to use two separate fields for year and month – Panagiotis Kanavos Jan 27 '17 at 11:11
  • your biggest problem here, in my opinion, is that the format does not order in the same order as the thing it represents - e.g 09.1980 is greater than 01.2017. If you do want to use date related strings that are not date type, always make sure you have the most significant date part at the left e.g YYYY.MM - that way you can readily use indexes for sorting etc. You can make them into a date if you want to, the philosophical problem there is that the notion of 01-MM-YYYY is in fact an arbitrary invention of a day to make it fit a particular type – Cato Jan 27 '17 at 11:41

2 Answers2

1

You're right on that storing them as strings will make querying slow and difficult. Storing these as proper dates allows indexing and much easier querying. If a date like 04.2015 indicates the period of 2015-04-01 to 2015-04-30, then I'd store period_start and period_end date columns. That makes it abundantly clear in the data what these dates represent. Checkout these options for figuring out the last day of a month.

One way to parse these into proper dates is like so using the German locale (dd.mm.yyyy). You could just prepend '01.' onto your dates.

select convert(date, '01.05.2016', 104)
Community
  • 1
  • 1
mroach
  • 2,403
  • 1
  • 22
  • 29
  • Would be a waste to use 2 columns to store the same information – t-clausen.dk Jan 27 '17 at 10:44
  • It's not the same information since the input isn't a date, it's a period. Storing the 1st or last day of the month would actually generate a bogus day value. One design option is to store the year and date separately. Or treat the data as a range and specify its start, end dates. Storing the range allows the use of date functions that may or may not be required – Panagiotis Kanavos Jan 27 '17 at 11:14
  • Thanks @mroach. The two columns approach is the most feasible solution in my case-scenario anyway. –  Jan 27 '17 at 12:10
  • @PanagiotisKanavos storing the start of a month and the end of a month from the same varchar column "032017" as 2017-03-01, 2017-03-31 is using 2 columns to store the same information. You are getting no additional values from the end of the month – t-clausen.dk Jan 27 '17 at 12:59
  • `032017` is just an unsortable, unorderable string, it doesn't provide any date or period information. You could argue about `201703` but that still takes more storage than eg storing the year and month in separate columns. You are actually *breaking* normalization this way. If you have to translate the value, something is wrong. PS: I'm sure Joe Celko has an article on this matter – Panagiotis Kanavos Jan 27 '17 at 14:04
  • @t-clausen.dk The two column approach allows for data they may be more narrow. Granted it might be slight overkill for this situation, but you don't *have* to use both columns as long as you know what the data represents. I advocate storing the data in date fields for ease of querying. Using the `BETWEEN` operator with dates is clean and easy. – mroach Jan 27 '17 at 15:27
1

I prefer first option, at the time of import convert to date. You can use first day of month or last day of month as day. then you can simple filter the date between.

Another simple method is, insert the financial month data to an Integer column in the following format

(YYYY*100)+MM
ie 04.2015 = (2015*100)+04 = 201504

Then you can simply filter between in this integer column in WHERE clause

... WHERE Fin_Month BETWEEN 201504 AND 201706
Abdul Rasheed
  • 6,486
  • 4
  • 32
  • 48