Given byte[] peek
where peek is N
bytes from a text file, how can I determine if peek
is XML?
Is it enough to just check for a <
in the start of the string?
Given byte[] peek
where peek is N
bytes from a text file, how can I determine if peek
is XML?
Is it enough to just check for a <
in the start of the string?
To determine, does given string have XML format, you need a parser (for Java, read this). This is the only way to get exact answer.
Checking first few bytes, in order to find <?xml
only gives you assumptions, whether it is valid XML. But you cannot be absolutely sure until you parse it to the end.
According to the XML standard, your files should use <?xml
to make it possible to tell if they are XML. If you have chosen not to follow that recommendation, there is no reliable way to tell. Some non-XML files will pass any test (by starting with <
) that looks at small-N bytes. Others won't. Also note that a valid XML file may begin with a Unicode BOM character, so be sure to take that into account if you are going to go ahead and try this.