0

I have a column of strings, called "MyStrings" like the following:

...
Foo bar Jul15 blah blah.xlsx
Choo bar Jul-15 blah far.xlsx
Star bar 10-Jul-15 blah far.xlsx
Car Star bar 10.Jul.2015 blah far.xlsx
...
...

I'd like to do string manipulation so all dates, whatever format, are not included in the results.

So the following query:

SELECT  results = <manipulated "MyStrings">
FROM    aTable

Should have these results:

...
Foo bar  blah blah.xlsx
Choo bar  blah far.xlsx
Star bar  blah far.xlsx
Car Star bar  blah far.xlsx
...
...

Is there a quick way of doing this or do I need to consider each format individually?

whytheq
  • 34,466
  • 65
  • 172
  • 267
  • In your example the month is always 3 characters, so maybe you could use that. Search for 'words' (units delimited by spaces), containing 3 letters and (some?) digits. – HoneyBadger Nov 25 '15 at 16:25
  • 2
    don't think you're gonna find a simple solution to this with multiple formats. – Tanner Nov 25 '15 at 16:28
  • @Tanner ye of little faith...I'm still hopeful – whytheq Nov 25 '15 at 16:29
  • Good luck. This is going to require human intervention to pull this off. There just way too many possibilities when the format is all over the place like that. – Sean Lange Nov 25 '15 at 16:29
  • @SeanLange so many pessimists ....someone out there might have bumped into this before – whytheq Nov 25 '15 at 16:30
  • 1
    LOL the difference between a pessimist and an optimist? Experience. :) Just from the very controlled values you posted this is challenging. From what is likely all kinds of formats this is just not feasible to do accurately. July, Jul, Jun, June, Jan, January etc...and that doesn't even take into account fat fingering. – Sean Lange Nov 25 '15 at 16:32
  • 1
    @SeanLange fat fingering... i'm gonna steal that one! – Tanner Nov 25 '15 at 16:36
  • @SeanLange there is something like mysql [STR_TO_DATE()](http://www.mysqltutorial.org/mysql-str_to_date/) on MSSQL? – Juan Carlos Oropeza Nov 25 '15 at 16:40
  • @JuanCarlosOropeza there is nothing like that function in sql server that I know of. That looks pretty robust. There is DateFromParts https://msdn.microsoft.com/en-us/library/hh213228.aspx but it can't handle all the extra characters that MySql function can. – Sean Lange Nov 25 '15 at 16:45
  • @SeanLange Ok guys check my version. I think is a good start. Feel free to offer hint to improve. – Juan Carlos Oropeza Nov 25 '15 at 17:34

1 Answers1

1

You need a Split function

If you split first by <space> is easy create regular expresion for

monDD
mon-DD
DD-mon-YY
DD-mon-YYYY

SQL Fiddle Demo

WITH splitCTE AS (
  SELECT s.[id], f.Number, f.Item
  FROM dbo.SourceData AS s
  CROSS APPLY dbo.SplitStrings(s.[test], ' ') as f
)
  SELECT *,
         CASE
             WHEN item Like 'Jul[0-9][0-9]' THEN 'mmmdd'
             WHEN item Like 'Jul-[0-9][0-9]' THEN 'mmm-dd'
             WHEN item Like '[0-9][0-9]-Jul-[0-9][0-9]' THEN 'dd-mmm-yy'
             WHEN item Like '[0-9][0-9].Jul.[0-9][0-9][0-9][0-9]' THEN 'dd.mmm.yyyy'
             ELSE ''
         END matchType         
  FROM splitCTE

OUTPUT

  • Need a join with list of 3 char months to replace the wired Jul.
  • Easy expand to also include a version with full month name.
  • Will match Jul77 as mmmdd but is a start.
  • You can calculate a IsValidDate column in another step
    • For some of the format you can use CONVERT to check for a valid date
    • For other like Jul77 you can separate first 3 char with last 2 and try to get a date.

.

| id | Number |        Item |   matchType |
|----|--------|-------------|-------------|
|  1 |      1 |         Foo |             |
|  1 |      2 |         bar |             |
|  1 |      3 |       Jul15 |       mmmdd |
|  1 |      4 |        blah |             |
|  1 |      5 |   blah.xlsx |             |
|  2 |      1 |        Choo |             |
|  2 |      2 |         bar |             |
|  2 |      3 |      Jul-15 |      mmm-dd |
|  2 |      4 |        blah |             |
|  2 |      5 |    far.xlsx |             |
|  3 |      1 |        Star |             |
|  3 |      2 |         bar |             |
|  3 |      3 |   10-Jul-15 |   dd-mmm-yy |
|  3 |      4 |        blah |             |
|  3 |      5 |    far.xlsx |             |
|  4 |      1 |         Car |             |
|  4 |      2 |        Star |             |
|  4 |      3 |         bar |             |
|  4 |      4 | 10.Jul.2015 | dd.mmm.yyyy |
|  4 |      5 |        blah |             |
|  4 |      6 |    far.xlsx |             |

Then use your favorite XML PATH to join back without the matching elements

Community
  • 1
  • 1
Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118