3

A column in the database I work with contains RTF strings, I would like to strip these out using PHP, leaving just the sentence between.

It is a MS SQL database 2005 if I recall correctly.

An example of the kind of strings pulled from the database (need any more let me know, all the rest are similar):

{\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fnil\fcharset0 Tahoma;}}
\viewkind4\uc1\pard\lang1033\f0\fs17 ASSEMBLE COMPONENTS AS DETAILED ON DRAWING.\lang2057\fs17\par 
}

I would like this to be stripped to only return:

ASSEMBLE COMPONENTS AS DETAILED ON DRAWING.

Now, I have successfully managed to strip the characters in ASP.NET for a previous project, however I would like to do so using PHP. Here is the regular expression I used in ASP.NET, which works flawlessly may I add:

"(\{.*\})|}|(\\\S+)"

However when I try to use the same expression in PHP with a preg_replace it does not strip half of the characters.

Any regex gurus out there?

Charlie
  • 197
  • 4
  • 15
  • 3
    Add delimiters `$string = preg_replace("#(\{.*\})|}|(\\\S+)#", "", $string);` – HamZa Jun 26 '13 at 10:52
  • Hmm that seems to have fixed the error thank you, however it still leaves a bunch of RTF strings prepended to the string, maybe it doesn't work quite the same as ASP.NET. – Charlie Jun 26 '13 at 10:56
  • I know, it does not do what you think it does. Anyways you should also use `\\\\S+` instead of `\\\S+` – HamZa Jun 26 '13 at 10:58
  • Ow, now I see something ugly, shouldn't you escape `}` like this `\}` ? I'm not sure what you're trying to do, but when I read the question and saw "RTF strings" I though it was "Right to left strings" which could be arabic letters or something like that. Maybe if you put some expected input and expected output, we could help you give you a better regex ? – HamZa Jun 26 '13 at 11:04
  • RTF stands for rich text format, used by Microsoft. Added example input and expected output to OP. – Charlie Jun 26 '13 at 11:08
  • lol I've no idea how I could interpret it as "Right to **L** eft" – HamZa Jun 26 '13 at 11:10
  • 2
    my bad, in PHP it's preferred to escape a backslash with 4 slashes, and you need one more slash for `\S` so it becomes 5 slashes: `#(\{.*?\})|\}|(\\\\\S+)#` – HamZa Jun 26 '13 at 11:16
  • Woah, boom! Absolutely correct answer sir! Submit as answer or something, works perfect for all records tested. – Charlie Jun 26 '13 at 11:20
  • I'm not convinced this is a good pattern (or actually a safe one), we may wait for someone to come up with a better idea. – HamZa Jun 26 '13 at 11:22
  • Tried and tested for over 20 different records, I'd say it's a good pattern. Safe in terms of what exactly? – Charlie Jun 26 '13 at 11:33
  • 1
    just try [this out](http://codepad.org/BM8ulFEy). Since it is greedy, it will match everything if it's not on the same line. I shall say you got lucky since you have newlines, note that `.` doesn't match newlines unless you use the `s` modifier. – HamZa Jun 26 '13 at 11:38
  • Seems to also work fine yes, note that I am using the trim function after the preg_replace as well. – Charlie Jun 26 '13 at 11:42
  • if everything is in 1 line it will delete characters that you don't want to delete. – HamZa Jun 26 '13 at 11:45
  • I did try all of the answers given in that thread before posting my own, none of them seemed to work. – Charlie Jun 27 '13 at 06:59

1 Answers1

4

Use this code. it will work fine.

$string = preg_replace("/(\{.*\})|}|(\\\S+)/", "", $string);

Note that I added a '/' in the beginning and at the end '/' in the regex.

Jahanzeb
  • 613
  • 4
  • 11
  • @HamZa yes, you are right. just edited the code. – Jahanzeb Jun 26 '13 at 11:11
  • 1
    Returns: \viewkind4\uc1\pard\f0\fs17 ASSEMBLE COMPONENTS AS DETAILED ON DRAWING.\fs17\par Not quite. I should add that there were about 20 leading and trailing spaces on it too. – Charlie Jun 26 '13 at 11:12
  • @Charlie No it will return this **ASSEMBLE COMPONENTS AS DETAILED ON DRAWING.** Try this code here http://www.solmetra.com/scripts/regex/index.php – Jahanzeb Jun 26 '13 at 11:14
  • It seems to there yes, doesn't in my PHP application. Possible causes of this? – Charlie Jun 26 '13 at 11:16
  • Share your code please. let me see what you are trying. – Jahanzeb Jun 26 '13 at 11:18
  • I was using this exactly, in its raw form: `echo preg_replace("/(\{.*\})|}|(\\\S+)/", "", $row['OperationDescription']);` – Charlie Jun 26 '13 at 11:25
  • However HamZa's solution in the comments in the OP works flawlessly. – Charlie Jun 26 '13 at 11:26