1

I'am just looking for Is there any particular Ids for files that are stored in a directory If yes how we can fetch that one through python. I have tried this one and i got a dict . But no where I can't see any unique Ids.

import os
cur = os.getcwd()

info = os.stat(cur)
print(info.__str__())

I got something like this :

os.stat_result (st_mode=33204, st_ino=21511460, st_dev=2049, st_nlink=1, st_uid=1001, st_gid=1001, st_size=378, st_atime=1516787918, st_mtime=1516787918, st_ctime=1516787918)

I have referred for st_ino that means inode number.But when I tried with program I observed that also changes some times.
If there any such ids are available can we fetch the file with those Ids ?

EDIT: As variables hold a id. Am wonder is there any similar one in file system while creating a file.

Vikas Periyadath
  • 3,088
  • 1
  • 21
  • 33
  • you can check the crc of the file, this would be the best idea i guess – F. Leone Jan 24 '18 at 10:27
  • what is the syntax to check the crc ? – Vikas Periyadath Jan 24 '18 at 10:28
  • `filepath/filename` is unique. – Psytho Jan 24 '18 at 10:28
  • If the file path change how can I find same file ? – Vikas Periyadath Jan 24 '18 at 10:32
  • 1
    @VikasDamodar It's not the same then! The content is the same, not the file! – Psytho Jan 24 '18 at 10:38
  • please avoid crc! As a signature it will cause many collisions and is very error prone. Better use SHA or even the old MD5 for instance. – kriss Jan 24 '18 at 11:01
  • @Psytho that is right one . But I actually the thing I want to know that is there any global ids that are representing each file, If yes can we fetch the file with that id(If I don't know the path and file name but knows that ID ) ? – Vikas Periyadath Jan 24 '18 at 11:43
  • Check https://unix.stackexchange.com/questions/92816/can-a-file-be-retrieved-by-its-inode inodes are uniquer numbers in a file systems, but file access functions can't use them to access files, that is because the file system security models relies on directory traversal (well, on Unix, and it's a bit more complicated). – kriss Jan 25 '18 at 09:51
  • https://stackoverflow.com/questions/36092559/open-file-by-inode – kriss Jan 25 '18 at 09:51
  • https://github.com/angrave/SystemProgramming/wiki/File-System,-Part-2:-Files-are-inodes-(everything-else-is-just-data...) – kriss Jan 25 '18 at 09:52

1 Answers1

1

The nearest thing to an id for a file (some unique number identifying the file in the filesystem) is called an inode, that's indeed the number returned by stat in the field st_ino.

This number may change in some circumstances even if the name of the file does not, for instance when the file is replaced by another one (copy), or deleted and recreated.

This number won't change if you merely open the file and perform reads and writes over it.

CHeck here for more detailed explanation about inodes https://github.com/angrave/SystemProgramming/wiki/File-System,-Part-2:-Files-are-inodes-(everything-else-is-just-data...)

Also notice that not all filesystems have inodes, this is a concept that originated on Unix. There is no such thing with vfat.

If you are only interested about the filename, another way to get a unique number is expanding the filename to it's full path up to the filesystem root (or drive on windows), then calling hash() on the string.

What you are losing doing that is that on some filesystems a given file on disk may be reached using several names (hardlinks or softlinks, I won't expand here on the differences). Depending on you use case it may or may not be a problem.

If you are looking for files with the same content, that's yet another story. Filesystems don't care for the content of the file. To know if two files are identical you'll have to open them and compare them. Using python you should have a look at filecmp module.

A common way to compare many files is relying on a hash signature of the file content. For instance have a look at that answer to see how to do that for MD5 (a bit outdated, but easily adapted to more modern signatures) Generating a MD5 signature of a file

kriss
  • 23,497
  • 17
  • 97
  • 116
  • I checked that `st_ino`. Its getting different when I gives above `cur` and when I mention filename instead `cur` – Vikas Periyadath Jan 24 '18 at 11:46
  • If you provide a directory name (what is returned by `os.gtecwd()`) , then you get the inode of the directory. If you what the inode of a file, provide the path to that file to `stat()`. Also you should take st_dev field into account, inodes are only unique per device. – kriss Jan 25 '18 at 09:43
  • Should be noted that inode number for hardlinks is the same, thus while inode ideally is a unique identifier, multiple filenames point to same inode (aka chunk of data on filesystem). – Sergiy Kolodyazhnyy Sep 09 '18 at 06:33
  • 1
    @Sergiy Kolodyazhnyy: or you can see it the other way around, one unique file in the system can have multiple names (pathes). That's how I understand things: the file is actually what you call "chunk of data", regardless of it's name (path). – kriss Sep 10 '18 at 08:53
  • 1
    @kriss Yup, exactly. Or a file may not have a name at all, as in [anonymous inodes](https://stackoverflow.com/questions/4508998/what-is-an-anonymous-inode-in-linux) – Sergiy Kolodyazhnyy Sep 10 '18 at 08:55