0

We are using a 3rd party application (i.e. I have no control over) that is storing RTF data as SQL Server's Image data type in SQL Server 2016. I am fully aware that Microsoft intends to depreciate image datatype, but as stated this is a 3rd party application.

My issue is I need to extract the actual RTF text data that is stored - is this even possible? I have SQL Server and/or SSIS to do this. The column in question is called NoteText; I have tried casting as varbinary then casting to varchar like this:

SELECT CAST(CAST(NoteText AS varbinary(max)) AS varchar(max)) AS test1
FROM Notes

which does get me on the right path as it produces:

{\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fnil\fcharset0 Tahoma;}}  \viewkind4\uc1\pard\f0\fs20 Useful Text I need\par  } 

However, I'm just interested in the:

Useful Text I need

from the result.

Within SSIS, I did try and use the Export Column transformation that did produce a rtf file but it was just full of unreadable characters.

Thank you in advance for any advice or tips.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Michael
  • 2,507
  • 8
  • 35
  • 71
  • 1
    Not with SQL. You'll have to do this with a client application and/or script. What you ask is a complex parsing operation. SQL, the language, is terrible at text manipulation and T-SQL doesn't even have regular expressions. – Panagiotis Kanavos Oct 29 '21 at 10:00
  • Which SQL Server version are you using? All supported versions allow you to use scripts in R, Python or Java (depends on the version). All versions in mainstream support can work with Python. SQL Server 2016 (which went out of mainstream support this year) only works with R. You can find a Python package that can extract the text. Check [Quickstart: Run simple Python scripts](https://learn.microsoft.com/en-us/sql/machine-learning/tutorials/quickstart-python-create-script?view=sql-server-ver15) – Panagiotis Kanavos Oct 29 '21 at 10:02
  • 1
    If you saved the blobs to files with `.rtf` extensions they can be opened with Wordpad on older versions of Windows, or most office programs such as Microsoft Office, LibreOffice or OpenOffice. – AlwaysLearning Oct 29 '21 at 10:03
  • @PanagiotisKanavos it is SQL Server 2016 – Michael Oct 29 '21 at 10:03
  • I wonder if it would be possible with a SSIS Script task? – Michael Oct 29 '21 at 10:04
  • SSIS script tasks can't use NuGet packages, although you can use regular expressions and .NET Framework 4 libraries. It can be painful. You *can* write regular expressions that remove specific tags and replace paragraph marks with newlines etc. It's equally possible to use any other script as an external CLI command though – Panagiotis Kanavos Oct 29 '21 at 10:06
  • 2
    You should check this [related SO question](https://stackoverflow.com/questions/44580580/how-to-convert-rtf-string-to-plain-text-in-python-using-any-library). The accepted answer uses only regular expressions and contains a list of RTF tags. – Panagiotis Kanavos Oct 29 '21 at 10:08

0 Answers0