3

I've been trying to make a Regex to match the charset of mime multipart emails so as I can decode them correctly. However I've found that there are some differences in the format that I can't seem to work out a Regex for, as I'm no expert. currently I'm using (?<=charset=).*(?=;) however the examples I've found by sending emails from different clients are:

Content-Type: text/plain; charset=ISO-8859-1; format=flowed

charset=US-ASCII;

Content-Type: text/plain; charset=iso-8859-1

So my Regex works on first two but not the last, however if I remove (?=;) then I will also match the format=flowed part, which I don't want.

Community
  • 1
  • 1
ianbarker
  • 1,255
  • 11
  • 22

3 Answers3

5

Instead of .*, you can use [^;]*. That is, match anything but the ;.

So, the pattern becomes:

(?<=charset=)[^;]*

References

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
1

Building on this I've found this catches a couple more circumstances:

(?<=charset=)(([^;,\r\n]))*

Hope that helps.

Verbeia
  • 4,400
  • 2
  • 23
  • 44
Phil Kermeen
  • 139
  • 1
  • 4
0

Match on either ; or the end of line ($).

Sjoerd
  • 74,049
  • 16
  • 131
  • 175