0

I am trying to get a USB device label from the udev library. But I have a problem when the label is not in UTF8 encoding.

The USB device was previously formatted on Windows and has the FAT32 file system. The USB name is “РФПАЦУ” (I used Cyrillic for test purposes (CP866 code page)). To get the USB device properties, I run the following command:

sudo /sbin/blkid -o udev -p /dev/sdd1

The answer is as follows:

ID_FS_LABEL=______
ID_FS_LABEL_ENC=\x90\x94\x8f\x80\x96\x93

According to https://bbs.archlinux.org/viewtopic.php?id=197582

ID_FS_LABEL contains plain ascii, with hex-escaped and any valid utf8 characters but all whitespaces are replaced with '_' , while in ID_FS_LABEL_ENC all potentially unsafe characters are replaced by the corresponding hex value prefixed by '\x'.

I cannot just unhex the ID_FS_LABEL_ENC since the amount of bytes to read is unknown.

Is there a way to find out the encoding of ID_FS_LABEL_ENC? Or a way to get the correct label of a USB device?

  • I cannot just unhex the ID_FS_LABEL_ENC since the amount of bytes to read is unknown, uh? Read the string til the end of it... – Jean-Baptiste Yunès Apr 19 '19 at 12:20
  • @Jean-BaptisteYunès I meant since the encoding is unknown, the number of bytes per 1 character is unknown – Kaminskiy Gleb Apr 19 '19 at 12:25
  • I can read the string, but how convert it to the utf8 format? – Kaminskiy Gleb Apr 19 '19 at 12:27
  • 1
    But the encoding is not unknown, it's cp866 as you told in your Q ;-) You can just unhex and pass it to `iconv -f cp866 -t utf-8` (or the iconv(3) API). It's absolutely not clear what you're after -- are you searching for a c++ API that determines the encoding of a string by heuristics? I don't think there's any [reliable](https://en.wikipedia.org/wiki/Bush_hid_the_facts) algorithm. –  Apr 19 '19 at 13:22
  • 1
    @mosvy in this case, it is cp866, but I assume that the label of USB device can be in any other encoding. I want to know can I patch the kernel / get extra info to determine encoding of label / etc., - something to get the correct USB label on linux. – Kaminskiy Gleb Apr 19 '19 at 13:58
  • 2
    The label of a FAT filesystem is just bytes. There's no info about its encoding stored anywhere. If it's not plain ascii, you can only guess what encoding it may be in. You cannot determine anything. –  Apr 19 '19 at 16:00
  • @KaminskiyGleb FAT doesn't refer to encoding (to my knowledge), labels are just bytes that must be interpreted in ASCII (that was the common standard at that time). You can store any bytes in it but the way they must interpreted cannot be specified. This is why may format tools forbid the use of exotic labels. So : don't use anything else than ASCII to be portable. – Jean-Baptiste Yunès Apr 20 '19 at 07:50

0 Answers0