3

First I created database with utf8mb4_general_ci collation and created table with same collation. Then I import csv file with

load data local infile '/mnt/c/Users/justi/Desktop/enml/enml.csv' 
into table dict 
CHARACTER SET utf8mb4
fields terminated by '\t' 
IGNORE 1 ROWS;

Sample data


+--------+----------------+----------------+---------------------------------+
| # id   | english_word   | part_of_speech | malayalam_definition            |
+--------+----------------+----------------+---------------------------------+
| 174569 | .net           | n              | പുത്തന്‍ കമ്പ്യൂട്ടര്‍ സാങ്കേതികത ഭാഷ      |
+--------+----------------+----------------+---------------------------------+
| 116102 | A bad patch    | n              | കുഴപ്പം പിടിച്ച സമയം               |
+--------+----------------+----------------+---------------------------------+
| 219752 | a bag of bones | phr            | വളരെയതികം മെലിഞ്ഞ വ്യക്തി അഥവാ മൃഗം |
+--------+----------------+----------------+---------------------------------+

I check with
SELECT malayalam_definition from dict;
then var_dump($row); gives

array(1) { ["malayalam_definition"]=> string(19) "ശരശയ്യ " }  
 array(1) { ["malayalam_definition"]=> string(22) "പൂമെത്ത " }  
 array(1) { ["malayalam_definition"]=> string(41) "സുഖകരമായ അവസ്ഥ " }   
  array(1) { ["malayalam_definition"]=> string(44) "അസുഖകരമായ അവസ്ഥ " }   
  array(1) { ["malayalam_definition"]=> string(22) "പൂമെത്ത " } 
  array(1) { ["malayalam_definition"]=> string(123) "സുഖകരമെങ്കിലും സ്വാതന്ത്യ്രമില്ലാത്ത അവസ്ഥ " }
...

You can find an unknown character after each word like "ശരശയ്യ ". I tried select trim(malayalam_definition) from dict but gives same result. how to find out that character after each words?

Khtty
  • 55
  • 8

1 Answers1

1

Converting the string to hex is one way:

SELECT HEX(malayalam_definition),CONCAT("{",malayalam_definition,"}")
FROM dict
WHERE id=116102
danblack
  • 12,130
  • 2
  • 22
  • 41
  • `syntax error, unexpected '") FROM dict WHERE id=116102"' (T_CONSTANT_ENCAPSED_STRING)` direct query gives `E0B495E0B581E0B4B4E0B4AAE0B58DE0B4AAE0B48220E0B4AAE0B4BFE0B49FE0B4BFE0B49AE0B58DE0B49A20E0B4B8E0B4AEE0B4AFE0B4820D ` and `{കുഴപ്പം പിടിച്ച സമയം }` – Khtty Feb 11 '19 at 08:40
  • yes, the quotes need to be adjusted for your php quoting context. – danblack Feb 11 '19 at 08:48
  • Did you see any problem in hex value? When I converted it, I get `കുഴപ്പം പിടിച്ച സമയം `. If you see closely There is a space I doubt? – Khtty Feb 11 '19 at 08:55
  • Found it `0D`. This value is the unknown character. What is this value? Is it a space? `\r`? – Khtty Feb 11 '19 at 11:47