7

Sorry if this is something that was asked before, but I am not able to find the answer to this. How do I create a MySQL select statement to select all strings from the database whose first characters is either in lowercase or uppercase.

I'd normally use something like this to select 10 random words from English Word Database with length of more than 8 characters:

SELECT word FROM en_dictionary WHERE CHAR_LENGTH(word)>8 ORDER BY RAND() LIMIT 10

Unfortunately, this also selects the words in Capital letters that I don't want in the results, as they are usually the terms such as names of cities, species etc.:

enter image description here

So, my question is: Is there a way to select only the words that start with the lowercase alphabet characters (or uppercase)?

I could do something like this and it'd work, but it'd be a very ugly statement:

AND word LIKE BINARY 'a%' AND word LIKE BINARY 'b%' AND word LIKE BINARY 'c%'...
jjj
  • 2,594
  • 7
  • 36
  • 57

3 Answers3

10

You can use Ascii() function. It returns the numeric value of the leftmost character of the input string.

For lowercase first character: ASCII code of a is 97 and z is 122. So, the ASCII code of the first character should be Between 97 and 122.

SELECT word 
FROM en_dictionary 
WHERE CHAR_LENGTH(word) > 8 
  AND ASCII(word) BETWEEN 97 AND 122
ORDER BY RAND() LIMIT 10

For uppercase first character: ASCII code of A is 65 and Z is 90. So, the ASCII code of the first character should be Between 65 and 90.

SELECT word 
FROM en_dictionary 
WHERE CHAR_LENGTH(word) > 8 
  AND ASCII(word) BETWEEN 65 AND 90
ORDER BY RAND() LIMIT 10
Madhur Bhaiya
  • 28,155
  • 10
  • 49
  • 57
3

Not a total answer, but way to large for a comment so i made a answer out of it.

I advice you to use a case sensitive utf8 collate like utf8_bin.
Then you don't need to use BINARY or other MySQL "hacks" to get the results you need you can simply use LIKE/BETWEEN with case sensitivity without problems.

Create table

CREATE TABLE test (

    utf8_general_ci_word VARCHAR(255) COLLATE utf8_general_ci
  , utf8_bin_word VARCHAR(255) COLLATE utf8_bin
);

INSERT INTO test (utf8_general_ci_word, utf8_bin_word) VALUES('a', 'a');
INSERT INTO test (utf8_general_ci_word, utf8_bin_word) VALUES('A', 'A');

INSERT INTO test (utf8_general_ci_word, utf8_bin_word) VALUES('b', 'b');
INSERT INTO test (utf8_general_ci_word, utf8_bin_word) VALUES('B', 'B');

Results

Query #1

    SELECT 
     utf8_general_ci_word
    FROM
     test 
    WHERE
     utf8_general_ci_word LIKE 'a%'

    ;

| utf8_general_ci_word |
| -------------------- |
| a                    |
| A                    |

Query #2

    SELECT 
     utf8_general_ci_word
    FROM
     test 
    WHERE
     utf8_general_ci_word LIKE BINARY 'a%'

    ;

| utf8_general_ci_word |
| -------------------- |
| a                    |

Query #3

    SELECT 
     utf8_bin_word
    FROM
     test 
    WHERE
     utf8_bin_word LIKE 'a%';

| utf8_bin_word |
| ------------- |
| a             |

demo

So your query would be something like

Query

SELECT
  word
FROM
  en_dictionary
WHERE
    CHAR_LENGTH(word) > 8
  AND
    word BETWEEN 'a' AND 'z'
ORDER BY
 RAND()
LIMIT 10

demo on the test data

Edited on 08-11-2019

This will miss words like 'zahn' because it's not <= 'z', use 'zzzzzzz' instead

Thanks to dnoeth 's comment and a very late sparring session later on 07-11-2019 there is indeed a bug in the SQL above for some cases..

As word BETWEEN 'a' AND 'z' is syntax sugar for word >= 'a' AND word <= 'z' this will indeed not match words greater then z like zahn or zzzzzzzzz for that matter. A valid patch for the query above is to use REPEAT('<char>', <number_of_size_of_table_column_datatype>) as the max range meaning in this case word BETWEEN 'a' AND REPEAT('z', 255)

the demo of the patch..

Raymond Nijland
  • 11,488
  • 2
  • 22
  • 34
  • Shouldn't it be `LEFT(word,1) Between 'a' and 'z'` – Madhur Bhaiya Oct 30 '18 at 12:13
  • "Shouldn't it be LEFT(word,1) Between 'a' and 'z'" No i use MySQL's string conversion to mine advantage @MadhurBhaiya check this https://www.db-fiddle.com/f/tCUgdpcwH7E78gviia6yqv/4 .. Besides if word is indexed i would lose the best index usage with `LEFT(word,1)` https://www.db-fiddle.com/f/tCUgdpcwH7E78gviia6yqv/6 – Raymond Nijland Oct 30 '18 at 12:20
  • Interesting +1; did some more experiements: https://www.db-fiddle.com/f/tCUgdpcwH7E78gviia6yqv/7 – Madhur Bhaiya Oct 30 '18 at 12:22
  • thanks for this late sparring session @dnoeth, the answer it updated with a patch which now does return correctly .. – Raymond Nijland Nov 08 '19 at 15:01
  • 1
    *"Shouldn't it be LEFT(word,1) Between 'a' and 'z'"* @MadhurBhaiya see the updated answer, a patch was indeed required to let this work correctly for all cases it was overlooked long ago :-) ... The plus the index usage is still [good](https://www.db-fiddle.com/f/tCUgdpcwH7E78gviia6yqv/8) – Raymond Nijland Nov 08 '19 at 15:39
2

You can use SUBSTRING and REGEXP BINARY

SELECT word 
FROM en_dictionary 
WHERE CHAR_LENGTH(word)>8 
AND SUBSTRING(word, 1, 1) REGEXP BINARY '[A-Z]'
ORDER BY RAND() LIMIT 10

With REGEXP BINARY You can use any regex as you want, see documentation : https://dev.mysql.com/doc/refman/8.0/en/regexp.html

SUBSTRING is simply for get the first letter.

See

How to check for uppercase letters in MySQL?

MySQL Select Query - Get only first 10 characters of a value

HRoux
  • 445
  • 10
  • 15