I decided to write my own answer as an attempt to combine and clarify the answers above (which heavily helped me solve my problems).
I'd say there are two approaches to this problem.
Situation 1: you know which metadata the file contains (which metadata you're interested in).
In this case, lets say you have a list of strings, which contains the metadata you're interested in. I assume here that these tags are correct (i.e. you're not interested in the number of pixels of a .txt file).
metadata = ['Name', 'Size', 'Item type', 'Date modified', 'Date created']
Now, using the code provided by Greedo and Roger Upole I created a function which accepts the file's full path and name separately and returns a dictionary containing the metadata of interest:
def get_file_metadata(path, filename, metadata):
# Path shouldn't end with backslash, i.e. "E:\Images\Paris"
# filename must include extension, i.e. "PID manual.pdf"
# Returns dictionary containing all file metadata.
sh = win32com.client.gencache.EnsureDispatch('Shell.Application', 0)
ns = sh.NameSpace(path)
# Enumeration is necessary because ns.GetDetailsOf only accepts an integer as 2nd argument
file_metadata = dict()
item = ns.ParseName(str(filename))
for ind, attribute in enumerate(metadata):
attr_value = ns.GetDetailsOf(item, ind)
if attr_value:
file_metadata[attribute] = attr_value
return file_metadata
# *Note: you must know the total path to the file.*
# Example usage:
if __name__ == '__main__':
folder = 'E:\Docs\BMW'
filename = 'BMW series 1 owners manual.pdf'
metadata = ['Name', 'Size', 'Item type', 'Date modified', 'Date created']
print(get_file_metadata(folder, filename, metadata))
Results with:
{'Name': 'BMW series 1 owners manual.pdf', 'Size': '11.4 MB', 'Item type': 'Foxit Reader PDF Document', 'Date modified': '8/30/2020 11:10 PM', 'Date created': '8/30/2020 11:10 PM'}
Which is correct, as I just created the file and I use Foxit PDF reader as my main pdf reader.
So this function returns a dictionary, where the keys are the metadata tags and the values are the values of those tags for the given file.
Situtation 2: you don't know which metadata the file contains
This is a somewhat tougher situation, especially in terms of optimality. I analyzed the code proposed by Roger Upole, and well, basically, he attempts to read metadata of a None
file, which results in him obtaining a list of all possible metadata tags. So I thought it might just be easier to hardcopy this list and then attempt to read every tag. That way, once you're done, you'll have a dictionary containing all tags the file actually possesses.
Simply copy what I THINK is every possible metadata tag and just attempt to obtain all the tags from the file.
Basically, just copy this declaration of a python list, and use the code above (replace metadata with this new list):
metadata = ['Name', 'Size', 'Item type', 'Date modified', 'Date created', 'Date accessed', 'Attributes', 'Offline status', 'Availability', 'Perceived type', 'Owner', 'Kind', 'Date taken', 'Contributing artists', 'Album', 'Year', 'Genre', 'Conductors', 'Tags', 'Rating', 'Authors', 'Title', 'Subject', 'Categories', 'Comments', 'Copyright', '#', 'Length', 'Bit rate', 'Protected', 'Camera model', 'Dimensions', 'Camera maker', 'Company', 'File description', 'Masters keywords', 'Masters keywords']
I don't think this is a great solution, but on the other hand, you can keep this list as a global variable and then use it without needing to pass it to every function call. For the sake of completness, here is the output of the previous function using this new metadata list:
{'Name': 'BMW series 1 owners manual.pdf', 'Size': '11.4 MB', 'Item type': 'Foxit Reader PDF Document', 'Date modified': '8/30/2020 11:10 PM', 'Date created': '8/30/2020 11:10 PM', 'Date accessed': '8/30/2020 11:10 PM', 'Attributes': 'A', 'Perceived type': 'Unspecified', 'Owner': 'KEMALS-ASPIRE-E\\kemal', 'Kind': 'Document', 'Rating': 'Unrated'}
As you can see, the dictionary returned now contains all the metadata that the file contains.
The reason this works is because of the if statement:
if attribute_value:
which means that whenever an attribute is equal to None
, it won't be added to the returning dictionary.
I'd underline that in case of processing many files it would be better to declare the list as a global/static variable, instead of passing it to the function every time.