I am looking to extract the file extension if it exists for web addresses (trying to identify which links are to a list of extensions which I do not want e.g. .jpg
, .exe
etc).
So, I would want to extract from the following URL www.example.com/image.jpg
the extension jpg
, and also handle cases when there is no extension such as www.example.com/file
(i.e. return nothing).
I can't think how to implement it, but one way I thought of was to get everything after the last dot, which if there was an extension would allow me to look that extension up, and if there wasn't, for the example www.example.com/file
it would return com/file
(which given is not in my list of excluded file-extensions, is fine).
There may be an alternative superior way using a package I am not aware of, which could identify what is/isn't an actual extension. (i.e. cope with cases when the URL does not actually have an extension).