I have a Content-Disposition header as such:
Content-Disposition: attachment; filename="övrigt.xlsx"; filename*=utf-8''%C3%B6vrigt.xlsx
According to specs there is either a filename="filename.extension"
and/or filename*=charencoding''filename.extension
. When filename* is present it should be used over filename.
So I want to catch the filename and the character encoding in the filename*
attribute over the filename
attribute when present. I ended up with this regex:
filename\*?=(?:([^'"]*)''|("))([^;]+)\2(?:[;`\n]|$)
It works fine, the only problem I have is that it matches whatever comes first, filename* or filename:
attachment; filename*=utf-8''%C3%B6vrigt.xlsx; filename="övrigt.xlsx"
Matches:
Match 1
Full match 12-45 filename*=utf-8''%C3%B6vrigt.xlsx;
Group 1. n/a utf-8
Group 3. n/a %C3%B6vrigt.xlsx
attachment; filename="övrigt.xlsx"; filename*=utf-8''%C3%B6vrigt.xlsx
Matches:
Match 1
Full match 12-35 filename="övrigt.xlsx";
Group 2. n/a "
Group 3. n/a övrigt.xlsx
Group 1 always matches character encoding when present.
Group 3 always matches the filename.
So I can now use filename and decode when group1 is not empty...
So to get to the question:
As I understood the *? should greedily try to match filename with * (see reference here):
The question mark is the first metacharacter introduced by this tutorial that is greedy. The question mark gives the regex engine two choices: try to match the part the question mark applies to, or do not try to match it. The engine always tries to match that part. Only if this causes the entire regular expression to fail, will the engine try ignoring the part the question mark applies to.
Why does it not work as expected, what am I doing wrong. How can I achieve matching of filename*=
over filename=
if present.