-1

To part this question from other questions:

  • The numbers are in every string
  • There are sometimes more than one number in a string
  • I only need the biggest one, representing money, not year
  • Euro sign doesn't appear in front of every number
  • 'pretax' doesn't appear in every row
  • I don't have the right to create function

I want to have something like this

Input:

Finance

€10m - €50m pretax
<€5m pretax
>€100m
€10m - 50m pretax from 2019

Output:

Number

50
5
100
50

I have tried this

select SUBSTRING([Finance], len([Finance])-CHARINDEX('€',REVERSE([Finance]))+2, 4)

but 4 doesn't always give the correct length, and doesn't work for €10m - 50m, using 'm' as another index could be a solution, but too complex. Is there any other solution which is accurate and elegant?

JWW
  • 23
  • 5
  • 4
    SQL isn't really an ideal tool for parsing strings. – Radu Gheorghiu Nov 25 '19 at 09:05
  • If you are in an environment supporting CLR you could write a CLR function for this – Martin Smith Nov 25 '19 at 09:07
  • 3
    Possible duplicate of [Query to get only numbers from a string](https://stackoverflow.com/questions/16667251/query-to-get-only-numbers-from-a-string) – Amira Bedhiafi Nov 25 '19 at 09:17
  • @MartinSmith thanks for that, but actually no, preferred solution would be everything in one script, :( – JWW Nov 25 '19 at 09:18
  • @xXx Thanks for that, but I have explained that the existing answers don't solve the current problem. Thus I asked this question. :( – JWW Nov 25 '19 at 09:37
  • @JiawenWang I can't understand this case ? '€10m - €50m ' – Amira Bedhiafi Nov 25 '19 at 09:38
  • @xXx could you be more specific? – JWW Nov 25 '19 at 09:39
  • @JiawenWang why are you extracting only 50 ?why not 10 ? – Amira Bedhiafi Nov 25 '19 at 09:43
  • @xXx Because I only need the biggest number, and this is also a very big difference comparing to Question 'Query to get only numbers from a string'. and people just ignore that... – JWW Nov 25 '19 at 09:49
  • 1
    If you want the largest number, of the last row, then its 2019? Not 50 - how do you intend to differentiate that? – Dale K Nov 25 '19 at 10:08
  • @DaleK you are right, it could be confusing. I was going to use euro sign as 'anchor', so it's only the number representing money that we want to extract. I will edit the question too – JWW Nov 25 '19 at 10:14
  • Just FYI if you have access to process them in any other place than SQL you could simplify the process. Example in JS: https://jsfiddle.net/rjef1nLc/ – Mukyuu Nov 25 '19 at 10:35
  • @Mukyuu thank you for the nice demo! it's a new possibility – JWW Nov 25 '19 at 10:56

2 Answers2

2

SQL Server is not optimal for string parsing. But you seem to have a very limited set of substrings you are looking for.

  • They end with 'm'.
  • They have '<', '>', and '€'.
  • They are separated by spaces.

This suggests splitting the string and doing some more basic processing on each piece. For your sample data, this works:

with t as (
      select v.*
      from (values ('€10m - €50m pretax'), ('<€5m pretax'), ('>€100m')) v(str)
     )
select *
from t outer apply
     (select top (1) try_convert(int, replace(replace(replace(replace(s.value, '€', ''), 'm', ''), '<', ''), '>', '')) as num
      from string_split(t.str, ' ') s
      where value like  '%m'
      order by try_convert(int, replace(replace(replace(replace(s.value, '€', ''), 'm', ''), '<', ''), '>', '')) desc
     ) x;

Here is a db<>fiddle.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you very much for the neat solution, I would love to try, but the old version of SQL server blocked my way. Pity that I cannot upgrade the compatibility level to 130 to use the string_split function. – JWW Dec 02 '19 at 10:14
  • @JWW . . . There are many examples of "split" functions on the web for earlier versions of SQL Server. – Gordon Linoff Dec 02 '19 at 11:14
  • true, that's a good reminder! I will check it out. @gordonlinoff – JWW Dec 02 '19 at 12:21
2

If the STRING_SPLIT function isn't available in your Sql Server version?
Then here is a "Way To Complicated For Something That Should Be Simple" method to get those max amounts.

It uses a recursive CTE to loop through the strings.
Then gets the MAX amounts from that.

Example snippet:

-- Sample data
DECLARE @Table table (
  Id int identity(1,1) primary key,
  [Finance] nvarchar(100)
);
INSERT INTO @Table ([Finance]) VALUES 
  ('€10m - €50m pretext')
 ,('<€5m pretext')
 ,('\>€100m')
 ,('€10m - 50m pretax from 2019 ')
 ,('abc €123m def 456m ghi')
;

;WITH RCTE AS
(
   -- The seed query
   SELECT Id, 
    [Finance] AS Str, 
    0 AS Lvl, 
    CAST(NULL AS INT)  AS Num,
    PATINDEX('%[0-9]%', [Finance]) AS pos1,
    PATINDEX('%[0-9][^0-9]%', [Finance]) AS pos2,
   SUBSTRING([Finance],PATINDEX('%[^0-9][0-9]%', [Finance]),1) AS Prefix
   FROM @Table
   WHERE [Finance] LIKE '%[0-9]m%'

   UNION ALL

   -- Looping through the strings
   SELECT Id, 
    SUBSTRING(Str,pos2+1,len(Str)),
    Lvl+1,
    TRY_CAST(SUBSTRING(Str,pos1,pos2-pos1+1) AS INT),
    PATINDEX('%[0-9]%', SUBSTRING(Str,pos2+1,LEN(Str))),
    PATINDEX('%[0-9][^0-9]%', SUBSTRING(Str,pos2+1,LEN(Str))),
    SUBSTRING(Str,pos1-1,1)
    FROM RCTE
    WHERE Str LIKE '%[0-9]m%'
),
AMOUNTS AS
(
    SELECT Id, 
    MAX(Prefix) AS Prefix,
    MAX(Num) AS MaxAmount
    FROM RCTE
    GROUP BY Id
)
SELECT t.Id, a.MaxAmount, a.Prefix, t.[Finance]
FROM @Table t
LEFT JOIN AMOUNTS a ON a.Id = t.Id
ORDER BY t.Id;

Result:

Id  MaxAmount   Prefix  Finance
1   50          €       €10m - €50m pretext
2   5           €       <€5m pretext
3   100         €       \>€100m
4   50          €       €10m - 50m pretax from 2019 
5   456         €       abc €123m def 456m ghi

A test on rextester here

But i.m.h.o. using STRING_SPLIT seems nicer.

DECLARE @Table table (
  Id int identity(1,1) primary key,
  [Finance] nvarchar(100)
);

INSERT INTO @Table ([Finance]) VALUES 
  ('€10m - €50m pretext')
 ,('<€5m pretext')
 ,('\>€100m')
 ,('€10.0m - 50m pretax from 2019 ')
 ,('abc €123m def 456m ghi')
 ,('200.5m & 50m')


SELECT t.Id, a.Prefix, a.MaxAmount, t.[Finance]
FROM @Table t 
OUTER APPLY
(
  SELECT MAX(LEFT(str,1)) AS Prefix, MAX(TRY_CAST(STUFF(str,1,1,'') AS FLOAT)) AS MaxAmount
  FROM
  (
    SELECT RIGHT(' '+value, PATINDEX('%[^0-9.]%', REVERSE(' '+value))) AS str
    FROM STRING_SPLIT(t.[Finance], 'm') AS spl
    WHERE value LIKE  '%[0-9]'
  ) q
) AS a

A test on db<>fiddle here

LukStorms
  • 28,916
  • 5
  • 31
  • 45