2

I would like to sort a result from a MySQL query by the number of words in a specified column.

Something like this:

SELECT bar FROM foo ORDER BY WordCountFunction(bar)

Is it possible?

nickhar
  • 19,981
  • 12
  • 60
  • 73
Martin Majer
  • 3,274
  • 4
  • 23
  • 36
  • Check mysql LENGTH() ! http://stackoverflow.com/questions/1870937/mysql-how-to-select-data-by-string-length – Vucko Nov 03 '12 at 23:49
  • In what language are these words written? – Jocelyn Nov 03 '12 at 23:50
  • 1
    @Vucko the correct spelling is `LENGTH`. And this function will not return the number of words in a text. – Jocelyn Nov 03 '12 at 23:51
  • Thanks, i missed it somehow. But what about LEN() ? http://www.w3schools.com/sql/sql_func_len.asp – Vucko Nov 03 '12 at 23:58
  • `LEN()` just returns the length of a field, and not how many words are potentially included in it. From memory, I don't think it's even valid in MySQL either. – nickhar Nov 04 '12 at 00:06
  • Official MySQL documentation: [String functions](http://dev.mysql.com/doc/refman/5.5/en/string-functions.html). There is really no need to refer to W3schools website. – Jocelyn Nov 04 '12 at 00:31

3 Answers3

5

As far as I know, there is no word count function in MySQL, but you can count the number of spaces and add one if your data is formatted properly (space separator for words, no spaces at beginning/end of entry).

Here is the query listing longest words first:

SELECT bar FROM foo ORDER BY (LENGTH(bar) - LENGTH(REPLACE(bar, ' ', ''))+1) DESC
Steven Liao
  • 3,577
  • 3
  • 19
  • 25
2

Using this method to count the number of words in a column, your query would look like this:

SELECT bar FROM foo ORDER BY (LENGTH(bar) - LENGTH(REPLACE(bar, ' ', ''))+1) DESC

sacohe
  • 768
  • 2
  • 5
  • 13
1

Yes, sort of. It won't be 100% accurate though:

SELECT SUM(LENGTH(bar) - LENGTH(REPLACE(bar, ' ', ''))+1) as count
FROM table
ORDER BY count DESC

But this assumes the words are separated by a space ' ' and doesn't account for punctuation. You can always replace it with another char and it doesn't account for double spaces or other chars either.

For complete accuracy, you could always pull the result out and word-count in your language of choice - where accurate word-count function do exist!

Hope this helps.

nickhar
  • 19,981
  • 12
  • 60
  • 73
  • I'd agree that it's probably easier to do the word count in an external language- you could just use a tokenizer and trim properly. Replacing spaces seems like somewhat of a hack-ish solution. – Steven Liao Nov 04 '12 at 00:00
  • @StevenLiao Yes, used as an approximation, this is a good technique, but not accurate 100% on written texts. Good if you've got many fields to scan over and return an approx result set. – nickhar Nov 04 '12 at 00:01