0

I have a table of URLs with data looking as shown below. The number (Steam AppId) following /app/ is what I'd like to insert into another table's INT column. Sometimes the data may be followed by an ending slash or more text that I don't need.

https://store.steampowered.com/app/123
https://store.steampowered.com/app/1234/
https://store.steampowered.com/app/1234567800/some_game_title/
https://store.steampowered.com/app/999https://baddata.com

This regex group 1 would isolate the number I need:

.*\/app\/(\d*)

However I don't know how to apply this to an SQL query. I'm using an Azure database to be clear.

The expected results would be

123
1234
1234567800
999

edit: added another url example for undesired characters following the AppId

  • 1
    I guess the first step would be to look into what string functions SQL Server provides and see if you can use them to create a solution – Dale K Jul 05 '23 at 04:04
  • 1
    The rule I would use would be: 1. Chop of the leftmost 34 characters 2. if a `/` exists in what remains, chop it and everything to the right of it off. – Nick.Mc Jul 05 '23 at 04:25
  • Does this answer your question? [Select query to remove non-numeric characters](https://stackoverflow.com/questions/18625548/select-query-to-remove-non-numeric-characters) – Ruud Helderman Jul 05 '23 at 06:29

3 Answers3

1
DECLARE @tempTable TABLE(
  urls VARCHAR(100)
)

INSERT INTO @tempTable(urls) VALUES ('https://store.steampowered.com/app/123')
INSERT INTO @tempTable(urls) VALUES ('https://store.steampowered.com/app/1234/')
INSERT INTO @tempTable(urls) VALUES ('https://store.steampowered.com/app/1234567800/some_game_title/')

SELECT
  CAST(SUBSTRING(urls, CHARINDEX('/app/', urls) + 5, ISNULL(NULLIF(CHARINDEX('/', urls, CHARINDEX('/app/', urls) + 5), 0), LEN(urls) + 1) - CHARINDEX('/app/', urls) - 5) AS INT) AS AppID
FROM
  @tempTable
WHERE
  CHARINDEX('/app/', urls) > 0;

enter image description here

Gehan Fernando
  • 1,221
  • 13
  • 24
  • Forget about the @tempTable, you can take SELECT query and replace your table name and column names with it – Gehan Fernando Jul 05 '23 at 05:02
  • 1
    Thank you! It's working great until I came across one such URL where the end of the address looks like this /app/317400?snr=2_9_100000_ I take it that the CHARINDEX('/' defines the cutoff.... is there an effective way to include more characters or stop before a non numeric character? – Peter Mercury Jul 05 '23 at 05:15
  • 2
    Please don't use images as per the site guide, use formatted tabular data. – Dale K Jul 05 '23 at 05:23
  • Sorry, bad data could also include something like this. Where the Url is doubled. Everything after the last numeric character is undesired. ``` https://store.steampowered.com/app/123https://somethingelse.com/345 ``` – Peter Mercury Jul 05 '23 at 05:25
  • @PeterMercury SQL is the wrong place to process and clean string values. If you need to be able to handle a variety of outlier formats, process your data *before* ingesting it in to your database. – MatBailie Jul 05 '23 at 07:56
  • You could replace question marks as slashes for a quick fix. – shawnt00 Jul 05 '23 at 08:29
  • @MatBailie The data was ingested from an existing database to my own. Whether I'm cleaning the original data or not is another matter. – Peter Mercury Jul 05 '23 at 11:39
1

Here is hopefully something that will work for you.

  • Find the position of the first numeric in the URL
  • Find the next non-numeric following this position
  • Take the string between these two positions

 select url, Substring(url, p1.p + 5, IsNull(NullIf(p2.p, 0), 50)) AppId
 from t
 cross apply(values(patindex('%/app/[0-9]%', url)))p1(p)
 cross apply(values((patindex('%[^0-9]%', Right(url, Len(url) - p1.p - 5 )))))p2(p);

Demo Fiddle

Stu
  • 30,392
  • 6
  • 14
  • 33
0

Another method by using tokenization via XML and XQuery.

Observing that numeric value in question is always in a 5th position in a string of tokens.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, url VARCHAR(200));
INSERT @tbl (url) VALUES
('https://store.steampowered.com/app/123'),
('https://store.steampowered.com/app/1234/'),
('https://store.steampowered.com/app/1234567800/some_game_title/'),
('https://store.steampowered.com/app/317400?snr=2_9_100000_I');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = '/';

SELECT t.*
    , c.value('(/root/r[5]/text())[1]', 'INT') AS num
FROM @tbl AS t
    CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
        REPLACE(TRANSLATE(url,'?',@separator), @separator, ']]></r><r><![CDATA[') + 
        ']]></r></root>' AS XML)) AS t1(c);

Output

Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21