1

C# Imap search command with special characters like á,é

I am trying to implement the logic mentioned in the above post in C# to achieve non-ascii based searches in gmail. After logging in successfully to imap.gmail.com I am having the following transaction with the server:

(C -> S) Encoding.Default.GetBytes("A4 UID SEARCH CHARSET UTF-8 TEXT {4}\r\n");
(C <- S) "+ go ahead\r\n"
(C -> S) Encoding.Default.GetBytes("αβγδ\r\n");
(C <- S) "* SEARCH 72\r\nA2 OK SEARCH completed (Success)"

However the email denoted by the response of the server is completely irrelevant to the search term I provided. This only happens when using non-ascii characters in the keywords and I believe I have something wrong with the encoding.

I have also tried using Encoding.Ascii but then I get search results that are even more off target.

What is the proper way to send the string literal: "αβγδ\r\n"

Community
  • 1
  • 1
XDS
  • 3,786
  • 2
  • 36
  • 56

1 Answers1

4

For the search term, you are using a so-called literal. The length of the literal has to be specified in octets. That's not the case in your example. The string "αβγδ" encoded in UTF-8 consists of more than four octets.

So, you should encode the search term before sending the length to the server.

I don't know much about C#. I make an example with Python:

search_term = 'Grüße'
encoded_search_term = search_term.encode('UTF-8')
length = str(len(encoded_search_term)).encode('ascii')

send(b'. UID SEARCH CHARSET UTF-8 TEXT {' + length + b'}\r\n')
read_until(br'^\+ .*$')

send(encoded_search_term + b'\r\n')
read_until(br'^\. OK .*$')

With this code, the search command returns the UIDs of the emails with the text "Grüße":

C: b'. UID SEARCH CHARSET UTF-8 TEXT {7}\r\n'
S: b'+ Ready for literal data\r\n'
C: b'Gr\xc3\xbc\xc3\x9fe\r\n'
S: b'* SEARCH 1 3 4\r\n'
S: b'. OK UID SEARCH completed\r\n'

If I use the length in characters (len(search_term)) instead of the encoded length in octets (len(encoded_search_term)), the IMAP server reports an error:

C: b'. UID SEARCH CHARSET UTF-8 TEXT {5}\r\n'
S: b'+ Ready for literal data\r\n'
C: b'Gr\xc3\xbc\xc3\x9fe\r\n'
S: b'. BAD expected end of data instead of "\\237e"\r\n'

Note, I didn't use Gmail for my tests.

nosid
  • 48,932
  • 13
  • 112
  • 139
  • I stand corrected. I will test this as soon as I have some time. If I understand correctly the two things I've done wrong are that #1 I havent performed the convertion of the search term to bytes in UTF8 (instead of Encoding.Default) and #2 I should place the number of decoded bytes to where {4} is at right now. Thanks again in advance and feel free to correct me if I am misinterpreting your answer. – XDS Apr 13 '12 at 10:07
  • search_term = 'Grüße' encoded_search_term = search_term.encode('UTF-8') length = str(len(encoded_search_term)).encode('ascii') This part is trully illuminating. I could have never figured it out on my own. Thanks a billion m8. I wish I had enough points to upvote you. As I said I will test this as soon as I have some time. Cheers. – XDS Apr 13 '12 at 18:34