2

So I'm exploring a table where I found a couple of fields that should be numerics, however as the source is an excel file so people have mixed already letters, symbols, and numbers.

I'm trying to clean up these fields but I can find too many kinds of symbols so currently, I've performed the cleaning with a couple of cases like:

SELECT
    CASE 
       WHEN Montant_Devise LIKE '%Free%' THEN '0'
       WHEN Montant_Devise LIKE ' ' THEN '0'
       WHEN Montant_Devise = '' THEN '0' 
       WHEN Montant_Devise = '-' THEN '0' 
       ELSE CAST(COALESCE(RTRIM(LTRIM(REPLACE(REPLACE(REPLACE(Montant_Devise, '€', ''), ' ', ''), ',', '.'))), '0') AS DECIMAL(20,2))
    END
FROM
    [dbo].[table_BI]

So, the deal here is that probably I may find new symbols in the future than to avoid include the new symbols in my case I would like to know how could I replace doesn't matter which symbol or letter by the default 0 zero.

Do you know guys how I could do this?

Thanks

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Andres Urrego Angel
  • 1,842
  • 7
  • 29
  • 55
  • 1
    Can't you just set this field to 0 by default? Can you provide more clarification on what you are trying to accomplish please, maybe some input and output examples. – Josh Adams Aug 17 '17 at 17:33
  • check it out @JoshAdams by default the field already is mixed between numbers, letters and symbols – Andres Urrego Angel Aug 17 '17 at 17:35
  • This is totally unclear. Please [edit] your post to be more specific about what you're trying to do and include your efforts to solve the problem yourself. – Ken White Aug 17 '17 at 17:36
  • This will help. https://stackoverflow.com/questions/21378193/regex-pattern-inside-sql-replace-function. – Dan Bracuk Aug 17 '17 at 17:41
  • check it out @KenWhite – Andres Urrego Angel Aug 17 '17 at 17:41
  • Rather than trying to create a never-ending list of unacceptable values, would it not make more sense to define what you do want to accept (e.g. only the digits 0-9) and specifically check for and allow them? – Damien_The_Unbeliever Aug 17 '17 at 17:44
  • I think looking and replacing is not the best solution. I would either: 1) update the database to use correct values, or 2) just hard ode an insert with '0' since it appears you will always set this to 0 – Josh Adams Aug 17 '17 at 17:45
  • We wary of your local specifier for thousands and decimal when applying some of the solutions this question may generate – billinkc Aug 17 '17 at 17:54
  • sql server needs regex support – Oto Shavadze Aug 17 '17 at 18:01
  • @OtoShavadze, this is SSIS so the OP could write a script task to use regex – HLGEM Aug 17 '17 at 18:54

2 Answers2

2

Start with

CASE WHEN patindex(Montant_Devise  '%[^0-9]%' > 0 '0'

Then do the empty string. Then do your else.

Dan Bracuk
  • 20,699
  • 4
  • 26
  • 43
2

I don't see a way to do this without a function, which is going to kill on performance. With that said, there are many out there. Here is one.

CREATE Function [fnRemoveNonNumericCharacters](@strText VARCHAR(1000))
RETURNS VARCHAR(1000)
AS
BEGIN
    WHILE PATINDEX('%[^0-9]%', @strText) > 0
    BEGIN
        SET @strText = STUFF(@strText, PATINDEX('%[^0-9]%', @strText), 1, '0')
    END
    RETURN @strText
END

The execution

declare @table table (x varchar(64) )
insert into @table
values
('332148790832473487...DFSKDJFH&#@!@#'),
('0X0C0V1234')

update t
set x = dbo.RemoveNonNumericCharacters(t.x)
from @table t

select * from @table

RETURNS

x
33214879083247348700000000000000000
0000001234

Alternatively, in the function you can just remove the non-numeric by changing

STUFF(@strText, PATINDEX('%[^0-9]%', @strText), 1, '0')

to...

STUFF(@strText, PATINDEX('%[^0-9]%', @strText), 1, '')

S3S
  • 24,809
  • 5
  • 26
  • 45