I have a file that extension is unknown. Is there a way to find the type of extension through the file header?
Asked
Active
Viewed 176 times
2 Answers
1
Usage:
GetFileHeader.isPDF(filename)
GetFileHeader Class written in java but you can use it in kotlin
public class GetFileHeader {
private static final int PDF_MAGIC[] = new int[] { 0x25, 0x50, 0x44, 0x46};
private static final int DOC1_MAGIC[] = new int[] { 0xd0, 0xcf, 0x11, 0xe0, 0xa1, 0xb1, 0x1a, 0xe1};
private static final int DOC2_MAGIC[] = new int[] { 0x0d, 0x44, 0x4f, 0x43};
private static final int DOC3_MAGIC[] = new int[] { 0xcf, 0x11, 0xe0, 0xa1, 0xb1,0x1a, 0xe1, 0x00};
private static final int DOC4_MAGIC[] = new int[] { 0xdb, 0xa5, 0x2d, 0x00};
private static final int DOC5_MAGIC[] = new int[] { 0xec, 0xa5, 0xc1, 0x00};
private static final int DOCX1_MAGIC[] = new int[] { 0x50, 0x4b, 0x03, 0x04};
private static final int DOCX2_MAGIC[] = new int[] { 0x50, 0x4b, 0x03, 0x04, 0x14, 0x00, 0x06, 0x00};
public static boolean isPDF(File filename) throws Exception {
FileInputStream ins = new FileInputStream(filename);
try {
for(int i = 0; i < PDF_MAGIC.length; ++i) {
if(ins.read() != PDF_MAGIC[i]) {
return false;
}
}
return true;
} finally {
ins.close();
}
}
public static boolean isDoc1(File filename) throws Exception {
FileInputStream ins = new FileInputStream(filename);
try {
for(int i = 0; i < DOC1_MAGIC.length; ++i) {
if(ins.read() != DOC1_MAGIC[i]) {
return false;
}
}
return true;
} finally {
ins.close();
}
}
public static boolean isDoc2(File filename) throws Exception {
FileInputStream ins = new FileInputStream(filename);
try {
for(int i = 0; i < DOC2_MAGIC.length; ++i) {
if(ins.read() != DOC2_MAGIC[i]) {
return false;
}
}
return true;
} finally {
ins.close();
}
}
public static boolean isDoc3(File filename) throws Exception {
FileInputStream ins = new FileInputStream(filename);
try {
for(int i = 0; i < DOC3_MAGIC.length; ++i) {
if(ins.read() != DOC3_MAGIC[i]) {
return false;
}
}
return true;
} finally {
ins.close();
}
}
public static boolean isDoc4(File filename) throws Exception {
FileInputStream ins = new FileInputStream(filename);
try {
for(int i = 0; i < DOC4_MAGIC.length; ++i) {
if(ins.read() != DOC4_MAGIC[i]) {
return false;
}
}
return true;
} finally {
ins.close();
}
}
public static boolean isDoc5(File filename) throws Exception {
FileInputStream ins = new FileInputStream(filename);
try {
for(int i = 0; i < DOC5_MAGIC.length; ++i) {
if(ins.read() != DOC5_MAGIC[i]) {
return false;
}
}
return true;
} finally {
ins.close();
}
}
public static boolean isDocX1(File filename) throws Exception {
FileInputStream ins = new FileInputStream(filename);
try {
for(int i = 0; i < DOCX1_MAGIC.length; ++i) {
if(ins.read() != DOCX1_MAGIC[i]) {
return false;
}
}
return true;
} finally {
ins.close();
}
}
public static boolean isDocX2(File filename) throws Exception {
FileInputStream ins = new FileInputStream(filename);
try {
for(int i = 0; i < DOCX2_MAGIC.length; ++i) {
if(ins.read() != DOCX2_MAGIC[i]) {
return false;
}
}
return true;
} finally {
ins.close();
}
}

Katana
- 752
- 4
- 23
-1
Do you have a good guess? There's not one standard file header that every format uses. Some (like .txt) have no header. The others all have their own custom header. Sometimes even the header isn't enough- there's multiple subheaders for different types of .wav and .bmp files. If you have a guess you can test that guess out using whatever header that format has, but if you don't even have a guess you're not going to get anywhere.
The header of a file isn't metadata, it's just (usually) the first N bytes of the file.

Gabe Sechan
- 90,003
- 9
- 87
- 127
-
i want to know if file is pdf, doc, docx, zip, rar – karma Jan 16 '22 at 07:14
-
this answer is for java also is incomplete https://stackoverflow.com/a/29033128/16474396 – karma Jan 16 '22 at 07:15
-
Ok, if you have a fixed list like that it's doable. You'd just need to check each one individually. Luckily that particular list is all well known documented formats, so it shouldn't be too hard to find the specs. For example, the zip file format will always start with 0x04034b50 as the first 4 bytes. Remember that while doing a check like that means it's very likely to be a zip file, there is a chance that it's not a valid one, or that it's some other format and just happens to have those first 4 bytes. – Gabe Sechan Jan 16 '22 at 07:20
-
Honestly most apps just look at the extention and trust it. That's right the vast majority of the time. Just code defensively if you actually want to parse and use the data, to assure that they aren't trying to trick you into some sort of memory access exploit. – Gabe Sechan Jan 16 '22 at 07:23