One mechanism for Windows machines is to look up the Content Type in the Windows Registry associated with the file extension. (I do not know of a way to do this without a direct registry lookup.)
Within the registry, file extensions that are text-based should generally have one or more of these characteristics:
- A Content Type indicating MIME primary type of text, e.g,
text/plain
or text/application
- A Perceived Type of
text
- A default handler with the GUID
{5e941d80-bf96-11cd-b579-08002b30bfeb}
, assigned to the plain text persistent handler.
The following method will return all system extensions associated with these characteristics:
// include using reference to Microsoft.Win32;
static IEnumerable<string> GetTextExtensions()
{
var defaultcomp = StringComparison.InvariantCultureIgnoreCase;
var root = Registry.ClassesRoot;
foreach (var s in root.GetSubKeyNames()
.Where(a => a.StartsWith(".")))
{
using (RegistryKey subkey = root.OpenSubKey(s))
{
if (subkey.GetValue("Content Type")?.ToString().StartsWith("text/", defaultcomp) == true)
yield return s;
else if (subkey.GetValue("PerceivedType")?.ToString().Equals("text", defaultcomp) == true)
yield return s;
else
{
using (var ph = subkey.OpenSubKey("PersistentHandler"))
{
if (ph?.GetValue("")?.ToString().Equals("{5e941d80-bf96-11cd-b579-08002b30bfeb}", defaultcomp) == true)
yield return s;
}
}
}
}
}
The output depends on the workstation configuration, but on my current machine returns:
.a, .AddIn, .ans, .asc, .asm, .asmx, .aspx, .asx, .bas, .bat, .bcp, .c, .cc, .cd, .cls, .cmd, ...
While this depends on application installers correctly mapping file extensions, it appears to identify most of the major text file types.