2
mysql>show full columns from  bpsw;
+--------+------------------+-----------+------+-----+---------+----------------+---------------------------------+---------+
| Field  | Type             | Collation | Null | Key | Default | Extra          | Privileges                      | Comment |
+--------+------------------+-----------+------+-----+---------+----------------+---------------------------------+---------+
| bpswid | int(10) unsigned | NULL      | NO   | PRI | NULL    | auto_increment | select,insert,update,references |         |
| badpsw | varchar(128)     | utf8_bin  | NO   | UNI | NULL    |                | select,insert,update,references |         |
+--------+------------------+-----------+------+-----+---------+----------------+---------------------------------+---------+

dont say me about not null and default null :)

mysql> SELECT USER(), CHARSET(USER()), COLLATION(USER());
+----------------+-----------------+-------------------+
| USER()         | CHARSET(USER()) | COLLATION(USER()) |
+----------------+-----------------+-------------------+
| root@localhost | utf8            | utf8_general_ci   |
+----------------+-----------------+-------------------+
1 row in set (0.00 sec)

table contents:

mysql> select * from bpsw limit X offset XXX ;
+--------+------------------------+
| bpswid | badpsw                 |
+--------+------------------------+
| 495883 | by all manner of means |
| 495884 | by all means           |
| 495885 | by all odds            |
| 495886 | by an ace              |
| 495887 | by an iota             |
| 495888 | by and by              |
| 495889 | by and large           |
| 495890 | by any chance          |
| 495891 | by any manner of means |
| 495892 | by any means           |
+--------+------------------------+
...
|   94950 | яростных                                                    |
|    1599 | ярь-медянка                                                 |
|    1600 | ястреб-перепелятник                                         |
|    1601 | ястреб-тетеревятник                                         |
|   94999 | яфетический                                                 |
|    1603 | яхт-клуб                                                    |
|    1604 | яхт-клуба                                                   |
...
|    1938 | яванский желоб                                              |
|    1939 | яванское море                                               |
|   94690 | еще какое-то слово                                          |
|    1940 | яде-бузен залив                                             |
|   94751 | ядерного                                                    |
|   94755 | раз два-три                                                 | 

need: select or remove from the table cyrillic words.

it is necessary to delete the rows in which ONE Cyrillic word WITHOUT digit, without special chars, without chars of punctuation.

Condition for removal: '^[а-я]+[а-я]$+'

select * from bpsw where badpsw regexp '^[a-z]+[a-z]$+';

With English words there are no problems, but with the Cyrillic alphabet I will not understand HOW it make.

I think that it is necessary to specify a collate?

UPD: mysql regex utf-8 characters ?

can here i need to look at the meanings of the Cyrillic symbols in the byte representation?

Aliskin
  • 169
  • 1
  • 9
  • Give a sample character from words you want to delete. – Vijunav Vastivch Dec 20 '17 at 00:56
  • Or Give a clear list Example you want to delete. – Vijunav Vastivch Dec 20 '17 at 00:59
  • @reds,, https://ru.wikipedia.org/wiki/%D0%A0%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B9_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82 all in lowercase а б в г д е ж з и й к л м н о п р с т у ф ч .... – Aliskin Dec 20 '17 at 01:33
  • All of that characters are to be deleted? – Vijunav Vastivch Dec 20 '17 at 01:34
  • @reds "to delete the rows in which ONE Cyrillic word WITHOUT digit, without special chars, without chars of punctuation." can I do without byte comparison? – Aliskin Dec 20 '17 at 01:36
  • Like what? can you show it here? some example word? any? – Vijunav Vastivch Dec 20 '17 at 01:36
  • You mean all of this are to be deleted? `94950 | яростных 1599 | ярь-медянка 1600 | ястреб-перепелятник 1601 | ястреб-тетеревятник 94999 | яфетический 1603 | яхт-клуб 1604 | яхт-клуба ... | 1938 | яванский желоб | 1939 | яванское море | 94690 | еще какое-то слово | 1940 | яде-бузен залив | 94751 | ядерного | 94755 | раз два-три – Vijunav Vastivch Dec 20 '17 at 01:40
  • chars: а б в г д е ё ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я – Aliskin Dec 20 '17 at 01:42
  • words to delete: | яростных | яфетический | ядерного | – Aliskin Dec 20 '17 at 01:42
  • @reds **UPD:** https://stackoverflow.com/questions/19774618/mysql-regex-utf-8-characters ? can here i need to look at the meanings of the **Cyrillic symbols in the _byte representation_**? – Aliskin Dec 20 '17 at 11:58

3 Answers3

1

It's easy:

select * from bpsw where badpsw regexp '^[абвгдеёжзийклмнопрстуфхцчшщъыьэюя]+$'; 

https://linux.org.ua/index.php?topic=11272.msg201662#msg201662

Aliskin
  • 169
  • 1
  • 9
1

SELECT USER(), CHARSET(USER()), COLLATION(USER()); says virtually nothing. USER() returns a string that is defined by the table that "user" comes from. It is always utf8.

Perhaps what you want is

mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8mb4                    |
| character_set_connection | utf8mb4                    |
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     |
| character_set_results    | utf8mb4                    |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

REGEXP works only with bytes, not non-ascii characters, such as the 2-byte Cyrillic utf8 characters. (Recent MariaDB versions can handle such.)

Instead of SHOW FULL COLUMNS..., use SHOW CREATE TABLE... -- that will make it more obvious whether the columns are NOT NULL or NULL.

This can discover Cyrillic strings:

WHERE HEX(col) REGEXP '^(..)*D[0-4]'

because, in utf8, all Cyrillic characters are 2 bytes, the first of which is HEX D0, D1, D2, D3, or D4.

SELECT HEX('яростных') REGEXP '^(..)*D[0-4]'; --> 1

You want to "remove" Cyrillic? If you want to remove the rows, then a DELETE with that WHERE will suffice. If you want to edit the text, that will take application code (unless you have MariaDB and can use REGEXP_REPLACE() ).

Have I addressed all the issues?

Edit

To delete the row where col1 contains just a single Cyrillic word,

 DELETE FROM tbl
     WHERE HEX(col1) REGEXP '^(D[0-4]..)+$';
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • thanks for the detailed explanation. I have already solved this "problem". It was necessary **not just to delete the Cyrillic alphabet**, but `to delete the rows that are WITH ONE Cyrillic word` WITHOUT spaces, WITHOUT hyphens, WITHOUT special chars. – Aliskin Dec 23 '17 at 16:25
  • @root_xPovierennyy - I added the regexp for that. – Rick James Dec 23 '17 at 17:55
  • %) I instead listed the entire Cyrillic alphabet. regexp '^[абвгдеёжзийклмнопрстуфхцчшщъыьэюя]+$'; I'm so more clear. And with encodings (more precisely with bytes) - this is horrible:;)) – Aliskin Dec 24 '17 at 02:46
  • @root_xPovierennyy - Yes, my solution is horrible. But it works. Your `REGEXP` may incorrectly match other non-english languages. – Rick James Dec 24 '17 at 13:34
0

Try to take a look at this:

Is this the Record you want to delete?

 select  bpswid,badpsw from  Your_Table
 where  badpsw like '%б%' or  badpsw like '%в%'
 or  badpsw like '%г%'
 or   badpsw like '%д%'
 or   badpsw like '%ё%'
 or  badpsw like '%ж%'
 or  badpsw like '%з%'
 or  badpsw like '%и%'
 or  badpsw like '%й%'
 or  badpsw like '%к%'
 or  badpsw like '%л%'
 or  badpsw like '%м%'
 or  badpsw like '%н%'
 or  badpsw like '%п%'
 or  badpsw like '%т%'
 or  badpsw like '%ф%'
 or  badpsw like '%у%'
 or  badpsw like '%ц%'
 or  badpsw like '%ч%'
 or  badpsw like '%ш%'
 or  badpsw like '%щ%'
 or  badpsw like '%ъ%'
 or  badpsw like '%ы%'
 or  badpsw like '%ь%'
 or  badpsw like '%э%'
 or  badpsw like '%ю%'
 or  badpsw like '%я%'
 or  badpsw like '%с%'
 or  badpsw like '%р%'
 or  badpsw like '%х%'
 or  badpsw like '%е%'
or  badpsw like '%о%'

Result:

94950   яростных
1599    ярь-медянка
1600    ястреб-перепелятник
1601    ястреб-тетеревятник
94999   яфетический
1603    яхт-клуб
1604    яхт-клуба
1938    яванский желоб
1939    яванское море
94690   еще какое-то слово
1940    яде-бузен залив
94751   ядерного
94755   раз два-три

if not then just remove the other in like condition. then proceed deletion of record.

Vijunav Vastivch
  • 4,153
  • 1
  • 16
  • 30