I want to identify the file-format of the input file given to my shell script - whether a .pst
or a .dbx
file. I checked How to check the extension of a filename in a bash script?. That one deals with txt
files and two methods are given there -
check if the extension is
txt
check if the mime type is
application/text
etc.I tried
file -ib <filename>
on a.pst
and a.dbx
file and it showedapplication/octet-stream
for both. However, if I just dofile <filename>
, then I get
this for the dbx file -
file1.dbx: Microsoft Outlook Express DBX File Message database
and this for the pst file -
file2.pst: Microsoft Outlook binary email folder (Outlook >=2003)
So, my questions are -
is it better to use mime type detection everytime when the output can be anything and we need a proper check?
How to apply mime type check in this case - both returning "application/octet-stream"?
Update
I didn't want to do an extension based detection because it seems we just can't be sure on a Unix system, that a .dbx file truly is a dbx file. Since file <filename>
returns a line which contains the correct information of the file (e.g. "Microsoft Outlook Express DBX File Message database"). That means the file command is able to identify the file type properly. Then why does it not get the correct information in file -ib <filename>
command?
Will parsing the string output of file <filename>
be fine? Is it advisable assuming I only need to identify a narrow set of data storage files of outlook family (MS Outlook Express, MS Office Outlook 2003,2007,2010 etc.). A small text identifier like application/dbx
which could be compared would be all I need.