19

Is there any function to encode HTML strings in T-SQL? I have a legacy database which contains dodgey characters such as '<', '>' etc. I can write a function to replace the characters but is there a better way?

I have an ASP.Net application and when it returns a string it contains characters which cause an error. The ASP.Net application is reading the data from a database table. It does not write to the table itself.

Leo Moore
  • 2,118
  • 2
  • 19
  • 21
  • The answers below are good but if those characters shouldn't be in the data then I'd suggest cleaning the data. Otherwise James is spot on. – Lazarus Mar 12 '09 at 16:31
  • The characters are correct in the data and if I change the data I could break the legacy app. So thats not an option. – Leo Moore Mar 12 '09 at 16:57
  • 2
    If your problem is in your ASP.NET code, then the 'best practices' way to handle this is to use the Server.HtmlEncode() function in the ASP.NET layer. Technically, you aren't supposed to store 'processed' data in your DB, you want the plain, real data there, not customized for a particular presentation system (HTML). If at some point you needed just the plain text without HTML entities, you still have a clean version of it in your DB. – Steve May 03 '10 at 13:15

10 Answers10

29

It's a bit late, but anyway, here the proper ways:

HTML-Encode (HTML encoding = XML encoding):

DECLARE @s NVARCHAR(100)
SET @s = '<html>unsafe & safe Utf8CharsDon''tGetEncoded ÄöÜ - "Conex"<html>'
SELECT (SELECT @s FOR XML PATH(''))

HTML-encode in a query:

SELECT 
    FIELD_NAME  
    ,(SELECT FIELD_NAME AS [text()] FOR XML PATH('')) AS FIELD_NAME_HtmlENcoded 
FROM TABLE_NAME

HTML-Decode:

SELECT CAST('<root>' + '&lt;root&gt;Test&amp;123' + '</root>' AS XML).value(N'(root)[1]', N'varchar(max)');

If you want to do it properly, you can use a CLR-stored procedure.
However, it gets a bit complicated, because you can't use the System.Web-Assembly in CLR-stored-procedures (so you can't do System.Web.HttpUtility.HtmlDecode(htmlEncodedStr);). So you have to write your own HttpUtility class, which I wouldn't recommend, especially for decoding.

Fortunately, you can rip System.Web.HttpUtility out of the mono sourcecode (.NET for Linux). Then you can use HttpUtility without referencing system.web.

Then you write this CLR-Stored-Procedure:

using System;
using System.Collections.Generic;
using System.Text;

using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
//using Microsoft.SqlServer.Types;


namespace ClrFunctionsLibrary
{


    public class Test
    {


        [Microsoft.SqlServer.Server.SqlFunction]
        public static SqlString HtmlEncode(SqlString sqlstrTextThatNeedsEncoding)
        {
            string strHtmlEncoded = System.Web.HttpUtility.HtmlEncode(sqlstrTextThatNeedsEncoding.Value);
            SqlString sqlstrReturnValue = new SqlString(strHtmlEncoded);

            return sqlstrReturnValue;
        }


        [Microsoft.SqlServer.Server.SqlFunction]
        public static SqlString HtmlDecode(SqlString sqlstrHtmlEncodedText)
        {
            string strHtmlDecoded = System.Web.HttpUtility.HtmlDecode(sqlstrHtmlEncodedText.Value);
            SqlString sqlstrReturnValue = new SqlString(strHtmlDecoded);

            return sqlstrReturnValue;
        }


        // ClrFunctionsLibrary.Test.GetPassword
        //[Microsoft.SqlServer.Server.SqlFunction]
        //public static SqlString GetPassword(SqlString sqlstrEncryptedPassword)
        //{
        //    string strDecryptedPassword = libPortalSecurity.AperturePortal.DecryptPassword(sqlstrEncryptedPassword.Value);
        //    SqlString sqlstrReturnValue = new SqlString(sqlstrEncryptedPassword.Value + "hello");

        //    return sqlstrReturnValue;
        //}

        public const double SALES_TAX = .086;

        // http://msdn.microsoft.com/en-us/library/w2kae45k(v=vs.80).aspx
        [SqlFunction()]
        public static SqlDouble addTax(SqlDouble originalAmount)
        {
            SqlDouble taxAmount = originalAmount * SALES_TAX;

            return originalAmount + taxAmount;
        }


    } // End Class Test


} // End Namespace ClrFunctionsLibrary

And register it:

GO

/*
--http://stackoverflow.com/questions/72281/error-running-clr-stored-proc
-- For unsafe permission
EXEC sp_changedbowner 'sa'
ALTER DATABASE YOUR_DB_NAME SET TRUSTWORTHY ON 

GO
*/


IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[HtmlEncode]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION [dbo].[HtmlEncode]
GO


IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[HtmlDecode]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION [dbo].[HtmlDecode]
GO




IF  EXISTS (SELECT * FROM sys.assemblies asms WHERE asms.name = N'ClrFunctionsLibrary' and is_user_defined = 1)
DROP ASSEMBLY [ClrFunctionsLibrary]

GO


--http://msdn.microsoft.com/en-us/library/ms345101.aspx



CREATE ASSEMBLY [ClrFunctionsLibrary]
AUTHORIZATION [dbo]
FROM 'D:\username\documents\visual studio 2010\Projects\ClrFunctionsLibrary\ClrFunctionsLibrary\bin\Debug\ClrFunctionsLibrary.dll' 
WITH PERMISSION_SET = UNSAFE  --EXTERNAL_ACCESS  --SAFE
;

GO




CREATE FUNCTION [dbo].[HtmlDecode](@value [nvarchar](max))
RETURNS [nvarchar](max) WITH EXECUTE AS CALLER
AS 
-- [AssemblyName].[Namespace.Class].[FunctionName]
EXTERNAL NAME [ClrFunctionsLibrary].[ClrFunctionsLibrary.Test].[HtmlDecode]
GO





CREATE FUNCTION [dbo].[HtmlEncode](@value [nvarchar](max))
RETURNS [nvarchar](max) WITH EXECUTE AS CALLER
AS 
-- [AssemblyName].[Namespace.Class].[FunctionName]
EXTERNAL NAME [ClrFunctionsLibrary].[ClrFunctionsLibrary.Test].[HtmlEncode]
GO



/*
EXEC sp_CONFIGURE 'show advanced options' , '1';
 GO
 RECONFIGURE;
 GO
 EXEC sp_CONFIGURE 'clr enabled' , '1'
 GO
 RECONFIGURE;
 GO

EXEC sp_CONFIGURE 'show advanced options' , '0';
 GO
 RECONFIGURE;
*/

Afterwards, you can use it like normal functions:

SELECT
     dbo.HtmlEncode('helloäÖühello123') AS Encoded
    ,dbo.HtmlDecode('hello&auml;&Ouml;&uuml;hello123') AS Decoded 

Anybody who just copy-pastes, please note that for efficiency reasons, you would use

public const double SALES_TAX = 1.086;

// http://msdn.microsoft.com/en-us/library/w2kae45k(v=vs.80).aspx
[SqlFunction()]
public static SqlDouble addTax(SqlDouble originalAmount)
{
     return originalAmount * SALES_TAX;
}

if you'd use this function in production.

See here for the edited mono classes:
http://pastebin.com/pXi57iZ3
http://pastebin.com/2bfGKBte

You need to define NET_2_0 in the build options Build options

Stefan Steiger
  • 78,642
  • 66
  • 377
  • 442
  • 1
    @Rez.Net: Be careful however, this can and will throw on valid html characters that are not specified in XML, like ä ö or © or é â œ ç Ω ß æ Œ ñ etc. – Stefan Steiger Sep 25 '14 at 06:40
27

We have a legacy system that uses a trigger and dbmail to send HTML encoded email when a table is entered, so we require encoding within the email generation. I noticed that Leo's version has a slight bug that encodes the & in &lt; and &gt; I use this version:

CREATE FUNCTION HtmlEncode
(
    @UnEncoded as varchar(500)
)
RETURNS varchar(500)
AS
BEGIN
  DECLARE @Encoded as varchar(500)

  --order is important here. Replace the amp first, then the lt and gt. 
  --otherwise the &lt will become &amp;lt; 
  SELECT @Encoded = 
  Replace(
    Replace(
      Replace(@UnEncoded,'&','&amp;'),
    '<', '&lt;'),
  '>', '&gt;')

  RETURN @Encoded
END
GO
Beniaminus
  • 904
  • 11
  • 15
  • Thanks, you are correct. I did chnage it in production but forgot to update the previous post. – Leo Moore Sep 29 '09 at 10:15
  • 1
    @Beniaminus: While it eliminates the most dangerous XML characters, that's actually far from "Html-Encoded", but I guess you know that yourselfs :) – Stefan Steiger Sep 03 '14 at 07:39
16

You shouldn't fix the string in SQL. A better way is to use a function in ASP.net called HtmlEncode, this will cook the special characters that cause the issues you're seeing see the example below. I hope this helps.

string htmlEncodedStr = System.Web.HttpUtility.HtmlEncode(yourRawStringVariableHere);
string decodedRawStr =  System.Web.HttpUtility.HtmlDecode(htmlEncodedStr);

Edit: Since you're data binding this from a datatable. Use an inline expression to call HTMLEncode in the markup of the GridView or whatever control your using and this will still satisfy your data binding requirement. See example below. Alternativly you can loop every record in the data table object and update each cell with the html encoded string prior to data binding.

<%# System.Web.HttpUtility.HtmlEncode(Eval("YourColumnNameHere")) %>
James
  • 12,636
  • 12
  • 67
  • 104
  • You can also use a BoundField. http://msdn.microsoft.com/en-us/library/system.web.ui.webcontrols.boundfield.aspx – bobince Mar 12 '09 at 16:43
7

I don't think data in a database should know or care about the user interface. Display issues should be handled by the presentation layer. I wouldn't want to see any HTML mingled into the database.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • I agree completely, but its not my choice. Its a legacy app with HTML type characters in the Guid (or what passes as the Guid). – Leo Moore Jun 12 '09 at 09:29
  • 2
    Presentation in the primary key? OMG. I'd refactor that as quickly as possible. – duffymo Jun 12 '09 at 09:51
  • @duffymo: Why not ? It may be a busy site. Saving content HtmlEncoded saves the encoding on every request. E.g. Rendered HTML created from Wiki markup - that will save a lengthy render on every request - you can pre-render it when markup is changed and saved. – Stefan Steiger Sep 17 '13 at 15:28
  • Ridiculous. Lengthy? Nope. I stand by my original ridicule. – duffymo Sep 17 '13 at 17:30
  • @duffymo: Just for the record - I meant save the presentation text in an additional db field HTML-encoded. Using presentation text as primary-key is stupid, I agree on that 120%. – Stefan Steiger Feb 07 '17 at 14:39
  • Has this been bothering you for more than three years? Why are you replying again now? You didn't sound like you agreed 120% three years ago. – duffymo Feb 07 '17 at 14:57
  • 1
    I just saw this six additional years later, and I also enjoyed the ridicule of the primary key. In addition to having presentation information in the primary key, if the primary key is also the clustering key, then the relatively large length of the key will cause index key bloat. So now we're up to 135% agreement. :) – chris Jun 02 '23 at 14:28
6

You can simply use 'XML PATH in your query'. For example;

DECLARE @encodedString VARCHAR(MAX)

SET @encodedString = 'give your html string you want to encode'

SELECT @encodedString
SELECT (SELECT @encodedString FOR XML PATH(''))

Now as your wish you can you this in your own sql function. Hope this will help.

2

If you're displaying a string on the web, you can encode it with Server.HTMLEncode().

If you're storing a string in the database, make sure the database field is "nchar", instead of "char". That will allow it to store unicode strings.

If you can't control the database, you can "flatten" the string to ASCII with Encoding.ASCII.GetString.

Andomar
  • 232,371
  • 49
  • 380
  • 404
0

I've been trying to do this today in T-SQL, mostly for fun at this point since my requirements changed, but i figured one way out. You can use a table of unicode characters, built from the NCHAR() function or just import it, iterating from 0 to 65535 (or less if you just need the first 512 or something). Then rebuild the string. There are probably better ways to rebuild the string, but this works in a pinch.

---store unicode chars into a table so you can replace those characters withthe decimal value

`

CREATE TABLE #UnicodeCharacters( DecimalValue INT, UnicodeCharacter NCHAR ) ;

--loop from 0 to highest unicode value you want and dump to the table you created
DECLARE @x INT = 0;
WHILE @x <= 65535
    BEGIN
        BEGIN
            INSERT INTO #UnicodeCharacters(DecimalValue, UnicodeCharacter)
            SELECT  @x,NCHAR(@x)
        END
        ;

        SET @x = @x + 1
        ;
    END
;

--index for fast retrieval
CREATE CLUSTERED INDEX CX_UnicodeCharacter_DecimalValue ON #UnicodeCharacters(UnicodeCharacter, DecimalValue);

--this is the string that you want to html-encode...
DECLARE @String NVARCHAR(100) = N'人This is a test - Ñ';

--other vars
DECLARE @NewString NVARCHAR(100) = '';
DECLARE @Word TABLE(Character NCHAR(1));
DECLARE @Pos INT = 1;

--run through the string and check each character to see if it is outside the regex expression
WHILE @Pos <= LEN(@String)
BEGIN
    DECLARE @Letter NCHAR(1) = SUBSTRING(@String,@Pos,1);
    PRINT @Letter;
    --rebuild the string replacing each unicode character outside the regex with &#[unicode value];
    SELECT  @NewString = @NewString + 
                CASE 
                    WHEN @Letter LIKE N'%[0-9abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-!@#$%^&*()_+-= ]%' THEN @Letter
                    ELSE '&#' + CAST(uc.DecimalValue AS VARCHAR(10)) + ';'
                END
    FROM    #UnicodeCharacters uc
    WHERE   @Letter = uc.UnicodeCharacter COLLATE JAPANESE_UNICODE_BIN

    SET @Pos += 1
END

--end result
SELECT @NewString
;

` I know typically you would use [0-9A-Za-z], but for some reason, it considered accented characters within the scope of that expression when I did that. So I explicitly used every character that i didn't want to convert to Unicode in the expression.

Last note, I had to use a different collation to do matches on Unicode characters, because the default LATIN collation (CI or otherwise) seemed to incorrectly match on accented characters, much like the regex in the LIKE.

Jeremy Giaco
  • 342
  • 3
  • 5
0

I haven't tried this solution myself but what I would try is utilise the sql server / .NET CLR integration and actually call the C# HTMLEncode function from the T-SQL. This may be inefficient but I suspect it would give you the most accurate result.

My starting point for working out how to do this would be http://msdn.microsoft.com/en-us/library/ms254498%28VS.80%29.aspx

Mark
  • 1,516
  • 2
  • 14
  • 24
-1

assign it to Text Property of label, it will be auto encoded by .NET

-1

OK here is what I did. I created a simple function to handle it. Its far from complete but at least handles the standard <>& characters. I'll just add to it as I go along.

CREATE FUNCTION HtmlEncode
(
    @UnEncoded as varchar(500)
)
RETURNS varchar(500)
AS
BEGIN
    DECLARE @Encoded as varchar(500)   
    SELECT @Encoded = Replace(@UnEncoded,'<','&lt;')
    SELECT @Encoded = Replace(@Encoded,'>','&gt;')
    SELECT @Encoded = Replace(@Encoded,'&','&amp;')   
    RETURN @Encoded    
END

I can then use:

Select Ref,dbo.HtmlEncode(RecID) from Customers

This gives me a HTML safe Record ID. There is probably a built in function but I can't find it.

p.campbell
  • 98,673
  • 67
  • 256
  • 322
Leo Moore
  • 2,118
  • 2
  • 19
  • 21