Regex to match charset

Question

I've been trying to make a Regex to match the charset of mime multipart emails so as I can decode them correctly. However I've found that there are some differences in the format that I can't seem to work out a Regex for, as I'm no expert. currently I'm using (?<=charset=).*(?=;) however the examples I've found by sending emails from different clients are:

Content-Type: text/plain; charset=ISO-8859-1; format=flowed

charset=US-ASCII;

Content-Type: text/plain; charset=iso-8859-1

So my Regex works on first two but not the last, however if I remove (?=;) then I will also match the format=flowed part, which I don't want.

score 5 · Accepted Answer · answered Jun 16 '10 at 11:16

5

Instead of .*, you can use [^;]*. That is, match anything but the ;.

So, the pattern becomes:

(?<=charset=)[^;]*

References

regular-expressions.info/Character Classes

answered Jun 16 '10 at 11:16

polygenelubricants

376,812
128
561
623

nice one, I should have thought of that – ianbarker Jun 16 '10 at 11:28

score 1 · Answer 2 · edited May 15 '12 at 09:48

1

Building on this I've found this catches a couple more circumstances:

(?<=charset=)(([^;,\r\n]))*

Hope that helps.

edited May 15 '12 at 09:48

Verbeia

4,400
2
23
44

answered May 15 '12 at 09:39

Phil Kermeen

139
1
4

score 0 · Answer 3 · answered Jun 16 '10 at 11:15

0

Match on either ; or the end of line ($).

answered Jun 16 '10 at 11:15

Sjoerd

74,049
16
131
175

1

If `.*` is greedy, this will overmatch if there are multiple `;` following `charset=` – polygenelubricants Jun 16 '10 at 11:30

Regex to match charset

3 Answers3

References

Linked