0

I would like the SQL to retrieve text data from a simple single MySQL table, where every (space delimited) word in a field is doubled / immediately repeated in the result.

Sample table

+--------------------+
| text_data          |
+--------------------+
| The man walks.     |
| Apple pie.         |
| This is a sentence |
+--------------------+

Desired output.

+---------------------------------------+
| output                                |
+---------------------------------------+
| The The man man walks. walks.         |
| Apple Apple pie. pie.                 |
| This This is is a a sentence sentence |
+---------------------------------------+

The order of the words is important.

Edit 1: This does have to be done in dynamically in SQL (no temp tables, no php) due to the application layer (the context is the index generation in Sphinx) and the ever changing nature of the source data.

Edit 2 (more detailed context): I want the wordforms functionality in Sphinx to handle certain pairs of words non-destructively. That is, if I turn "playstation 3" into "playstation playstation 3 3", I can still apply the word form "playstation 3 > ps3", and effectively the article would be searchable by both the queries "playstation" and "ps3". If I don't double up, but still apply the wordform, then I lose the ability to retrieve the article by searching for "playstation".

Riedsio
  • 9,758
  • 1
  • 24
  • 33

3 Answers3

1

A few options...

  1. You could use XMLpipe/2 indexing method to index the data for sphinx. Then you can use arbitary (PHP etc) code during indexing. That way you could use php to solve your particular problem. http://sphinxsearch.com/docs/current.html#xmlpipe2

  2. If you have a small wordforms file, you could just hard-code the specific cases into mysql

    sql_query = SELECT id,REPLACE(text_data, 'playstation 3', 'playstation playstation 3 3') AS text_data,....
    

    (perhaps using nested REPLACE(...) calls)

    (if case is an issue, maybe the PREG_REPLACE UDF could be used)

Doubling all words, seems like a bad idea anyway, as it will break sphinx ranking and the ability to search three+ phrase terms (without some hacks). So using xmlpipe, so the PHP can read the wordforms directly, and only douple up the required terms.

barryhunter
  • 20,886
  • 3
  • 30
  • 43
0

I'm not sure I would choose sql as my first language to do this, but if I had to tackle this I would.

  1. create a split function that returns a table if I didn't already have one handy.
  2. update every record in the table to the word plus itself.
  3. select all records from the table into a concatenated result.

Let me know if any of these steps isn't explicit enough.

Matthew Vines
  • 27,253
  • 7
  • 76
  • 97
  • Related to your answers 1st step [Can you split/explode a field in a MySQL query?](http://stackoverflow.com/questions/471914/can-you-split-explode-a-field-in-a-mysql-query) – Conrad Frix Dec 20 '11 at 22:52
  • The context does not permit temp structures or processing in other-layers. My Question was edited to reflect this. – Riedsio Dec 20 '11 at 22:52
0

You can write your own function that do this. Here a first approach. Notice that this is only a template, you should coding for a bit:

DROP FUNCTION IF EXISTS REPEAT_WORD
DELIMITER //
CREATE FUNCTION REPEAT_WORD(
   X TEXT
)
RETURNS TEXT
BEGIN
   DECLARE Y TEXT, PAROLE VARCHAR(255);
   WHILE INSTR( X, ' ') > 0 DO
      SET PAROLE = get first word from X
      SET Y = Y + ' ' + PAROLE + ' ' + PAROLE 
      SET X = remove PAROLE from X ...
   END WHILE;
   RETURN Y;
END
//

then use the function in select:

select REPEAT_WORD( text_data) as `output` 
from table
dani herrera
  • 48,760
  • 8
  • 117
  • 177