0

My table contain column [File] with names of files

I have files like :

U_1456789_23456789_File1_automaticrepair

U_3456789_3456789_File2_jumpjump

B_1134_445673_File3_plane

I_111345_333345_File4_chupapimonienio

P_1156_3556_File5 idk what

etc...

I want to create column where i will see only bolded values, how i can do that ?

jarlh
  • 42,561
  • 8
  • 45
  • 63
Mgkraft
  • 5
  • 3
  • 2
    Which RDBMS system are you using? Not all of them support regular expressions. – D M Sep 15 '21 at 14:07

1 Answers1

1

If your RDBMS supports it, a regular expression is a much cleaner solution. If it doesn't, (and SQL Server doesn't by default) you can use a combination of SUBSTRING and CHARINDEX to get the text in the column between the second and third underscores as explained in this question.

Assuming a table created as follows:

CREATE TABLE [Files] ([File] NVARCHAR(200));
INSERT INTO [Files] VALUES 
('U_1456789_23456789_File1_automaticrepair'),
('U_3456789_3456789_File2_jumpjump'),
('B_1134_445673_File3_plane'),
('I_111345_333345_File4_chupapimonienio'),
('P_1156_3556_File5 idk what');

You can use the query:

SELECT [File],
  SUBSTRING([File], 
            
            -- Start taking after the second underscore
            -- in the original field value
            CHARINDEX('_', [File], CHARINDEX('_', [File]) + 1) + 1,  

            -- Continue taking for the length between the
            -- index of the second and third underscores
            CHARINDEX('_', [File], CHARINDEX('_', [File], CHARINDEX('_', [File]) + 1) + 1) - (CHARINDEX('_', [File], CHARINDEX('_', [File]) + 1) + 1)) AS Part
FROM [Files];

To get the results:

File Part
U_1456789_23456789_File1_automaticrepair 23456789
U_3456789_3456789_File2_jumpjump 3456789
B_1134_445673_File3_plane 445673
I_111345_333345_File4_chupapimonienio 333345
P_1156_3556_File5 idk what 3556

See the SQL Fiddle


Edit: to brute force support for inputs with only two underscores:

CREATE TABLE [Files] ([File] NVARCHAR(200));
INSERT INTO [Files] VALUES 
('U_1456789_23456789_File1_automaticrepair'),
('U_3456789_3456789_File2_jumpjump'),
('B_1134_445673_File3_plane'),
('I_111345_333345_File4_chupapimonienio'),
('P_1156_3556_File5 idk what'),
('K_25444_filenamecar');

Add a case for when a third underscore could not be found and adjust the start position/length passed to SUBSTRING.

SELECT [File],
  CASE WHEN CHARINDEX('_', [File], CHARINDEX('_', [File], CHARINDEX('_', [File]) + 1) + 1) = 0
    THEN
      SUBSTRING([File],
                CHARINDEX('_', [File]) + 1,
                CHARINDEX('_', [File], CHARINDEX('_', [File]) + 1) - (CHARINDEX('_', [File]) + 1))
    ELSE
      SUBSTRING([File], 
                CHARINDEX('_', [File], CHARINDEX('_', [File]) + 1) + 1,  
                CHARINDEX('_', [File], CHARINDEX('_', [File], CHARINDEX('_', [File]) + 1) + 1) - (CHARINDEX('_', [File], CHARINDEX('_', [File]) + 1) + 1))
    END AS Part
FROM [Files];
File Part
U_1456789_23456789_File1_automaticrepair 23456789
U_3456789_3456789_File2_jumpjump 3456789
B_1134_445673_File3_plane 445673
I_111345_333345_File4_chupapimonienio 333345
P_1156_3556_File5 idk what 3556
K_25444_filenamecar 25444

See the SQL Fiddle

Note that this approach is even more brittle and you're definitely in the realm of problem that is likely better handled in application code instead of by the SQL engine.

D M
  • 5,769
  • 4
  • 12
  • 27
  • Something is wrong, when i select top 1000 that was working fine but when i select top 10000 it has error "Invalid lenght parameter passed to the LEFT or SUBSTRING function" – Mgkraft Sep 15 '21 at 15:13
  • Do you have some values that don't have a second/third underscore? This is a brittle solution that assumes a given input pattern. – D M Sep 15 '21 at 15:14
  • yes, i found only 5 and looks like K_25444_filenamecar – Mgkraft Sep 15 '21 at 15:20
  • I added a section about falling back to the first/second underscore if the second/third don't exist. I would seriously consider a different approach unless this is a one-off solution. – D M Sep 15 '21 at 15:30
  • thats fine everything, do you know maybe how i can cut chupapimonienio from [I_111345_333345_File4_chupapimonienio] and if [K_25444_filenamecar] cut filenamecar ? – Mgkraft Sep 21 '21 at 14:25
  • I tried right([File], len([File]) - charindex('_', [File], charindex('_', [File], charindex('_', [File]) for I_111345_333345_File4_chupapimonienio and right([File], len([File]) - charindex('_', [File], charindex('_', [File] for K_25444_filenamecar for but idk how to make case – Mgkraft Sep 21 '21 at 14:31
  • Sure, SqlFiddle is down right now so I can't show you a demo, but [here's a question](https://stackoverflow.com/questions/1024978/find-index-of-last-occurrence-of-a-sub-string-using-t-sql/20369450) about finding the last occurrence of a character in a field. – D M Sep 21 '21 at 15:03
  • The answer suggests using [`REVERSE`](https://learn.microsoft.com/en-us/sql/t-sql/functions/reverse-transact-sql?view=sql-server-ver15) then `CHARINDEX` to find the last occurrence in the string, then `SUBSTRING` from the [`LEN`](https://learn.microsoft.com/en-us/sql/t-sql/functions/len-transact-sql?view=sql-server-ver15) minus the index you found to take everything after the last occurrence. – D M Sep 21 '21 at 15:04
  • but how to make case here ? – Mgkraft Sep 21 '21 at 15:14
  • I mean " I tried right([File], len([File]) - charindex('', [File], charindex('', [File], charindex('', [File]) for I_111345_333345_File4_chupapimonienio and right([File], len([File]) - charindex('', [File], charindex('_', [File] for K_25444_filenamecar" this is fine but idk how can i add CASE here ... – Mgkraft Sep 21 '21 at 15:18
  • You want the contents of the field after the last underscore, right? There's a better way than hardcoding cases for different numbers of underscores in the field. Ask another question including details about your data and the output you want if you need more details. – D M Sep 21 '21 at 15:19
  • Yes i know but some files are like L_243434_33333_Super_Mountain_Landscape.png – Mgkraft Sep 21 '21 at 15:21
  • I assume you'd want "Super_Mountain_Landscape" from that? – D M Sep 21 '21 at 15:21
  • yes and that "right([File], len([File]) - charindex('', [File], charindex('', [File], charindex('', [File])" give me this – Mgkraft Sep 21 '21 at 15:23
  • the problem is that not working for files like K_34343_Superauto ... – Mgkraft Sep 21 '21 at 15:26
  • I don't have a good solution. If you know the input and desired output, you can of course write an expression that works for that one input and output. Choosing which expression to use for which input/output is more logic than just a `CASE` statement. I suspect you're going to need to do some manipulation in code after fetching results from SQL. – D M Sep 21 '21 at 15:26
  • For example, you say that `I_111345_333345_File4_chupapimonienio` should return `chupapimonienio` which could be done by taking everything after the last underscore. But that wouldn't work for `L_243434_33333_Super_Mountain_Landscape.png` where you want `Super_Mountain_Landscape`. A `CASE` statement won't help you until you figure out how to tell which rule to apply to which field. If you have a good set of input/expected output, a new question could help give you ideas. – D M Sep 21 '21 at 15:29
  • not exactly I_111345_333345_File4_chupapimonienio should return File4_chupapimonienio everything is after third underscore – Mgkraft Sep 21 '21 at 15:31
  • for files like l_33333_44444_file_chupapimonienio return everything after third underscore and for files like K_2137911_bus return everything after second underscore – Mgkraft Sep 21 '21 at 15:33
  • That's just the code in my answer without an ending `SUBSTRING` index. [SqlFiddle](http://sqlfiddle.com/#!18/27741/8) – D M Sep 21 '21 at 15:41
  • Yes i see and its everything working great :DDD, really big thanks for your help, you are amazing :) – Mgkraft Sep 21 '21 at 17:35