27

I need to accept a list of file names in a query string. ie:

http://someSite/someApp/myUtil.ashx?files=file1.txt|file2.bmp|file3.doc

Do you have any recommendations on what delimiter to use?

Matthew Cole
  • 1,329
  • 2
  • 18
  • 30

8 Answers8

32

Having query parameters multiple times is legal, and the only way to guarantee no parsing problems in all cases:

http://someSite/someApp/myUtil.ashx?file=file1.txt&file=file2.bmp&file=file3.doc

The semicolon ; must be URI encoded if part of a filename (turned to %3B), yet not if it is separating query parameters which is its reserved use.

See section 2.2 of this rfc:

2.2. Reserved Characters

URIs include components and subcomponents that are delimited by characters in the "reserved" set. These characters are called "reserved" because they may (or may not) be defined as delimiters by the generic syntax, by each scheme-specific syntax, or by the implementation-specific syntax of a URI's dereferencing algorithm. If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.

 reserved    = gen-delims / sub-delims

 gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

 sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
             / "*" / "+" / "," / ";" / "="
Community
  • 1
  • 1
Cory Kendall
  • 7,195
  • 8
  • 37
  • 64
16

If they're filenames, a good choice would be a character which is disallowed in filenames. Suggestions so far included , | & which are generally allowed in filenames and therefore might lead to ambiguities. / on the other hand is generally not allowed, not even on Windows. It is allowed in URIs, and it has no special meaning in query strings.

Example:

http://someSite/someApp/myUtil.ashx?files=file1.txt|file2.bmp|file3.doc is bad because it may refer to the valid file file1.txt|file2.bmp.

http://someSite/someApp/myUtil.ashx?files=file1.txt/file2.bmp/file3.doc unambiguously refers to 3 files.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • 11
    This answer is mostly incorrect. The | character is not valid in URLs, per official specs, and may cause various problems if you attempt to use it. The & character is reserved in query strings to separate name/value pairs. The comma may be an OK choice, but more research is needed. – Doug S Apr 12 '14 at 20:06
  • it is dangerous to assume | is not allowed in filenames. Linux allows any characters in filenames. but in Windows, yes, there are a handful of specific exclusions, which include |. – Shawn Kovac Jan 25 '16 at 16:54
  • 4
    @DougS: I'm explaining why `|` is a bad choice with regard to filenames. It may also be a bad choice in URL's, I'm not even making a claim about that. So how does that make my answer incorrect? – MSalters Jan 26 '16 at 07:49
  • 2
    I notice no one is disputing the use of "/" – RationalRabbit Mar 18 '17 at 18:14
8

I would recommend making each file its own query parameter, i.e.

myUtil.ashx?file1=file1.txt&file2=file2.bmp&file3=file3.doc

This way you can just use standard query parsing and loop

Appulus
  • 18,630
  • 11
  • 38
  • 46
Andrew Harry
  • 13,773
  • 18
  • 67
  • 102
7

Do you need to list the filenames as a string? Most languages accepts arrays in the querystring so you could write it like

http://someSite/someApp/myUtil.ashx?files[]=file1.txt&files[]=file2.bmp&files[]=file3.doc

If it doesn't, or you can't use for some other reason, you should stick to a delimiter that is either not allowed or unusual in a filename. Pipe (|) is a good one, otherwise you could urlencode an invisible character since they are quite easy to use in coding, but harder to actually include in a filename.

I usually use arrays when possible and pipe otherwise.

Jimmy Stenke
  • 11,140
  • 2
  • 25
  • 20
  • 1
    it is dangerous to assume | is not allowed in filenames. Linux allows any characters in filenames. but in Windows, yes, there are a handful of specific exclusions, which include |. – Shawn Kovac Jan 25 '16 at 16:52
6

I've always used double pipes "||". I don't have any good evidence to back up why this is a good choice other than 10 years of web programming and it's never been an issue.

Rick Hochstetler
  • 3,043
  • 2
  • 20
  • 16
1

This is one common problem. How i handled it was: I created a method which accepted a list of strings, then found a character that was not in any of the strings. (I did this by a simple concatenation of the strings, then testing for various characters.) Once a character was found, concatenated all the strings together but also prepended the string with the separation character. So in the given question, one example wud be: http://someSite/someApp/myUtil.ashx?files=|file1.txt|file2.bmp|file3.doc and another wud be: http://someSite/someApp/myUtil.ashx?files=,file1.txt,file2.bmp,file3.doc But since i actually use a method that guarantees my separator character is not in the rest of the strings, it is safe. It was a bit of work to create the first time, but i've used it MANY times in various applications.

Shawn Kovac
  • 1,425
  • 15
  • 17
  • i don't know where my original code for this is. i did it years ago. maybe a decade ago, i don't even remember how long it has been. if i need this functionality again, i'd recode it. what i did in recent times (within last couple years), i just used Andrew Harry's approach: file1=aoeu.txt&file2=snth.txt&file3=aoeu.jpg Then i just looped infinitely until 'file'+n was not found in the querystring. – Shawn Kovac Jul 13 '18 at 21:52
0

I would build on MSalters answer by saying, to generalize, the best delimiter is one that is invalid to the items in the list. For example, if your list is prices, a comma is a bad delimiter because it can be confused with the values. For that reason, as most these answers suggest, I think a good general purpose delimiter is probably "|" as it is rarely a valid value. "/" is maybe not the best delimiter generally as it is valid for paths sometimes.

tdog
  • 595
  • 5
  • 6
0

I think I would consider using commas or semicolons.

nan
  • 4,238
  • 4
  • 25
  • 24
  • 2
    Semicolon is a valid replacement for &, so is not a good choice: "The series of pairs is separated by the ampersand, '&' or semicolon, ';'." (http://en.wikipedia.org/wiki/Query_string#Structure) – dwynne Nov 08 '11 at 08:35