0

In SQL Server, I have a column containing a string formatted as yyyy_d_m_(randominteger)_(randominteger)_blahblahblah string.

So

column
2021_28_6_42_blahblahblahblah
2021_8_12_17_4_blahblahblahblah

etc.

I need to retrieve everything until the third underscore and convert that to a date.

I've tried using variations on:

SELECT
left(column, patindex('%[^0-9]%', column))

However, the trailing integers are varying lengths. So are the day and month since they weren't input with zeros. I'm also running into issues the the underscores as wildcards.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
elboi
  • 1
  • 2

5 Answers5

1
DECLARE @text VARCHAR(200) = '2021_8_12_17_4_blahblahblahblah';

-- https://stackoverflow.com/questions/8726111/sql-server-find-nth-occurrence-in-a-string
with T as (
    select 0 as row, charindex('_', @text) pos, @text as txt, 1 as occurance
    union all
    select pos + 1, charindex('_', @text, pos + 1), @text, occurance+1
    from T
    where pos > 0
)
select 
    @text, pos , occurance, substring(@text,0,pos) as "DATE"
from T 
where occurance=3
  • First i declare a variable (named @text)
  • With a little help from stackoverflow, the link is one of the result from using Google: for tsl find character in string, and some small changes

The result:

                                  pos         occurance   DATE
--------------------------------- ----------- ----------- ------------
2021_8_12_17_4_blahblahblahblah   10          3           2021_8_12

Where the column under DATE is the piece of text you are looking for.

P.S. It would be prettier to create a function with this code, but ....

Luuk
  • 12,245
  • 5
  • 22
  • 33
1

Compute positions of first two underscores in CROSS APPLY just for brevity.

select datefromparts(substring(t.d, 1, i1 - 1), 
                     substring(t.d, i2 + 1, charindex('_', t.d, i2 + 1) - i2 - 1),
                     substring(t.d, i1 + 1, i2 - i1 - 1)
                     ) dt
from (
       values 
       ('2021_8_12_17_4_blahblahblahblah'),
       ('2021_28_6_42_blahblahblahblah')       
     ) t(d) -- your table here
cross apply( 
       select charindex('_', t.d, 1) i1, charindex('_', t.d, charindex('_', t.d, 1) + 1) i2
     ) t2
Serg
  • 22,285
  • 5
  • 21
  • 48
1

Yet another option with a bit of JSON

Select A.*
      ,AsDate = datefromparts(JSON_VALUE(S,'$[0]')
                             ,JSON_VALUE(S,'$[2]')
                             ,JSON_VALUE(S,'$[1]')
                             )
From YourTable A
Cross Apply ( values ( '["'+replace([column],'_','","')+'"]' ) ) B(S)

Results

column                          AsDate
2021_28_6_42_blahblahblahblah   2021-06-28
2021_8_12_17_4_blahblahblahblah 2021-12-08

Note: If by chance you have double quotes and such in your string, you may have to apply a string_escape()

...
Cross Apply ( values ( '["'+replace(STRING_ESCAPE([column],'json'),'_','","')+'"]' ) ) B(S)
John Cappelletti
  • 79,615
  • 7
  • 44
  • 66
0

That is a very bad design. You could still convert it to a date but if performance was an issue then I would use a CLR function.

create table baddata (bad varchar(100));
insert into baddata (bad) values 
('2021_28_6_42_blahblahblahblah'),
('2021_8_12_17_4_blahblahblahblah');

with bd (bad,v,rn) as
(
  select bad, 
      cast(v as int), 
      row_number() over (partition by bad order by bad)
  from baddata
  cross apply (select top(3) value 
          from string_split(baddata.bad,'_')) t(v)
),
ydm (bad, y,d,m) as
(
  select bad, 
     sum(cast(case when rn=1 then v end as int)) y,
     sum(cast(case when rn=2 then v end as int)) d,
     sum(cast(case when rn=3 then v end as int)) m
  from bd
  group by bad
)
select bad, datefromparts(y,m,d) [date]
from ydm;
Cetin Basoz
  • 22,495
  • 3
  • 31
  • 39
  • MS `string_split` docs say : The output rows might be in any order. The order is not guaranteed to match the order of the substrings in the input string. https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql – Serg Jul 03 '21 at 16:38
  • @Serg, I didn't know that STRING_SPLIT() in SQL server had such a buggy implementation. I am not surprised, I wonder which great mind in MS decided it shouldn't guarantee the order. – Cetin Basoz Jul 04 '21 at 12:39
  • I was really surprised when come across that fact too. – Serg Jul 04 '21 at 13:01
0

You need to compute the position of the underscore in the first 3 positions. The following works by successively CROSS APPLY'ing to get the CHARINDEX positions. Something like this

select datefromparts(ci2.dt_yr, ci4.dt_mo, ci3.dt_day) dt_from_parts
from (values ('2021_8_12_17_4_blahblahblahblah'),
             ('2021_28_6_42_blahblahblahblah')) string(dt) 
     cross apply (values (charindex('_', string.dt, 1))) ci1(loc)
     cross apply (values (charindex('_', string.dt, ci1.loc+1),
                          left(string.dt, ci1.loc-1))) ci2(loc, dt_yr)
     cross apply (values (charindex('_', string.dt, ci2.loc+1),
                          substring(string.dt, ci1.loc+1, (ci2.loc-ci1.loc)-1))) ci3(loc, dt_day)
     cross apply (values (substring(string.dt, ci2.loc+1, (ci3.loc-ci2.loc)-1))) ci4(dt_mo);
dt_from_parts
2021-12-08
2021-06-28

Suppose this code is unsightly and too much to look at. It could be extracted into a SCHEMABOUND inline table valued function which you could use CROSS APPLY with to simplify the query.

drop function if exists dbo.test_fnExtractDate;
go
create function dbo.test_fnExtractDate(
    @input_string            varchar(64))
returns table with schemabinding as return 
select datefromparts(ci2.dt_yr, ci4.dt_mo, ci3.dt_day) dt_from_parts
from (values (@input_string)) string(dt) 
     cross apply (values (charindex('_', string.dt, 1))) ci1(loc)
     cross apply (values (charindex('_', string.dt, ci1.loc+1),
                          left(string.dt, ci1.loc-1))) ci2(loc, dt_yr)
     cross apply (values (charindex('_', string.dt, ci2.loc+1),
                          substring(string.dt, ci1.loc+1, (ci2.loc-ci1.loc)-1))) ci3(loc, dt_day)
     cross apply (values (substring(string.dt, ci2.loc+1, (ci3.loc-ci2.loc)-1))) ci4(dt_mo);
go

Then the query could look like this

select fex.*
from (values ('2021_8_12_17_4_blahblahblahblah'),
             ('2021_28_6_42_blahblahblahblah')) string(dt) 
    cross apply dbo.test_fnExtractDate(string.dt) fex
dt_from_parts
2021-12-08
2021-06-28
SteveC
  • 5,955
  • 2
  • 11
  • 24