18

I have a php-application which is (per request) scanning for the existance of some files. (on a network share)

I'm using glob for this, cause usually i just know the beginning of the filename.

I noticed, that glob does not return files, that are currently opened by any client, thus my application thinks file_xy is not existing, if somebody has opened it.

Is there a way to make glob return opened (:= locked?) files as well?

The strange thing is, that this is no where mentioned. However I can confirm that glob is NOT returning files, that are currently opened by a client... (As soon as the client closes the accessing application, glob will return the file as usual)


ps.: not even glob("\\server\share\*") is returning the file as long as its opened. (Network Share allows the maximum number of concurrent users)


    $dir = opendir ("\\server\share");
    while ($file = readdir($dir)){
      echo $file."<br />";
    }

shows the file in question perfectly fine, no matter if opened by another client or not. - So I can almost exclude any access-limit / permission thingy...


I figured out the cause even if I do not know the reason now:

The Issue with glob() not finding an opened file appears, when the file is located on a drive that's using Windows Server 2012 R2 build in data-deduplication feature.

If I move the file to a non deduplicated share, glob() can read it, even when opened by multiple clients.


Since I have a working alternative, this question should mainly focus on the question why glob does not work - or let's say work different here. There has to be a difference in how glob and readdir are accessing the underlaying filesystem to determine the contents.


Another Proof

There is another proof, that this relates to data-deduplication: I configured the feature to "only" deduplicate files older than 3 days.

I set up a cronjob, "opening and globing" a certain file on the share. Once it was ~ 3 days old (Windows decides when to deduplicate), glob failed to list the file while its opened by another client.

Thus, glob is able to find open files, that has been copied to the share WITHIN the first 3 days - and then starts to miss it, once it has been deduplicated.

Observations

glob

glob fails, causing this post :-)

scandir

Using the mentioned scandir function shows the very same behavior:

  • deduplicated file opened by a client - missing in the resulting array.
  • deduplicated file not opened by a client - part of the resulting array.

opendir / readdir

I want to underline again, that opendir along with readdir works in both cases.

RecursiveDirectoryIterator

This produced the expected result at any time as well.

File Attributes

I noted, that deduplicated files are shown with a "Size on Harddrive" of 0 Bytes, while not yet deduplicated files (which are successfully found) are shown with the size they are logically occupying (based on filesystems cluster-size):

However this would not explain why it makes a difference whether a file is opened by a client or not. Size report is equal at any time.

File Attributes of deduplicated and not deduplicated file

dognose
  • 20,360
  • 9
  • 61
  • 107
  • Both, executing environment and environment holding the network share are Windows Server 2012 R2. – dognose Jun 21 '15 at 21:32
  • ps.: not even `glob("\\server\share\*")` is returning the file as long as its opened. (Network Share allows the maximum number of concurrent users) – dognose Jun 21 '15 at 21:40
  • Add the solution as an `answer` and then mark it complete -- so future googlers can find it! – degenerate Jun 24 '15 at 13:46
  • @degenerate There is no solution for now, just knowing the source of the error. Will add some bounty, maybe some `c-ish` guys can figure out why glob might fail for deduplicated data. (I believe it has to do with `reparse points` windows server is using to physically deduplicate data) – dognose Jun 24 '15 at 13:50
  • Are you open to not using glob? – Saeven Jun 24 '15 at 13:58
  • @Saeven currrently i'm using `readdir` along with `strpos` to fake `glob("/path/abc*")` at the moment, which works - mainly interested in *why* glob is failing here. – dognose Jun 24 '15 at 14:04
  • 6
    I see. Well, the glob source is here: https://github.com/php/php-src/blob/master/win32/glob.c You can see there are a number of break conditions when things can't be stat'ed. It's a safe guess that it's hitting one of these conditions. One thing for sure, I'm never ever going to launch a Windows box to find out! ;) Good luck man. – Saeven Jun 24 '15 at 14:10

4 Answers4

2

I'm not sure if this is what you're looking for but i use scandir() to list all the files in a directory, then you can excecute any command on them once you know the name. It will work on open files as well

PHP scandir documentation source

AfikDeri
  • 2,069
  • 17
  • 19
1

This makes some sense if the intent of deduplication is not to have duplicates, then the files are being locked and php cant see them. The only thing to do is is see if this limitation applies to scandir() and the SPL directory/filesystem family of iterators as well. If so it may not be possible to get a list of them.

The only other choice would be to use exec() and a windows command line sort of hack to see if you can get a list of files and then parse the output. This may be useful

php exec: does not return output

Good luck!

Community
  • 1
  • 1
ArtisticPhoenix
  • 21,464
  • 2
  • 24
  • 38
1

Did you try

$files = glob('{,.}*', GLOB_BRACE);

It might be possible that the data de-dupe feature is keeping the opened file as a hidden file.

dognose
  • 20,360
  • 9
  • 61
  • 107
Rajat Garg
  • 355
  • 1
  • 2
  • 11
0

Do you prepare to use another function than glob()? You can try to use SPL's recursive iterators if it will find an opened file that's located on a drive that's using Windows Server 2012 R2 build in data-deduplication feature. You can find an example how you can use it from this link.

Community
  • 1
  • 1
  • This works. However as mentioned i know working alternatives, i'm seeking for the (programmatic) reason of this different behavior. – dognose Jul 01 '15 at 21:07