9

I encountered this problem while using Scrapy's FifoDiskQueue. In windows, FifoDiskQueue will cause directories and files to be created by one file descriptor and consumed (and if no more message in the queue, removed) by another file descriptor.

I will get error messages like the following, randomly:

2015-08-25 18:51:30 [scrapy] INFO: Error while handling downloader output
Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "C:\Python27\lib\site-packages\scrapy\core\engine.py", line 154, in _handle_downloader_output
    self.crawl(response, spider)
  File "C:\Python27\lib\site-packages\scrapy\core\engine.py", line 182, in crawl
    self.schedule(request, spider)
  File "C:\Python27\lib\site-packages\scrapy\core\engine.py", line 188, in schedule
    if not self.slot.scheduler.enqueue_request(request):
  File "C:\Python27\lib\site-packages\scrapy\core\scheduler.py", line 54, in enqueue_request
    dqok = self._dqpush(request)
  File "C:\Python27\lib\site-packages\scrapy\core\scheduler.py", line 83, in _dqpush
    self.dqs.push(reqd, -request.priority)
  File "C:\Python27\lib\site-packages\queuelib\pqueue.py", line 33, in push
    self.queues[priority] = self.qfactory(priority)
  File "C:\Python27\lib\site-packages\scrapy\core\scheduler.py", line 106, in _newdq
    return self.dqclass(join(self.dqdir, 'p%s' % priority))
  File "C:\Python27\lib\site-packages\queuelib\queue.py", line 43, in __init__
    os.makedirs(path)
  File "C:\Python27\lib\os.py", line 157, in makedirs
    mkdir(name, mode)
WindowsError: [Error 5] : './sogou_job\\requests.queue\\p-50'

In Windows, Error 5 means access is denied. A lot of explanations on the web quote the reason as lacking administrative rights, like this MSDN post. But the reason is not related to access rights. When I run the scrapy crawl command in a Administrator command prompt, the problem still occurs.

I then created a small test like this to try on windows and linux:

#!/usr/bin/python

import os
import shutil
import time

for i in range(1000):
    somedir = "testingdir"
    try:
        os.makedirs(somedir)
        with open(os.path.join(somedir, "testing.txt"), 'w') as out:
            out.write("Oh no")
        shutil.rmtree(somedir)
    except WindowsError as e:
        print 'round', i, e
        time.sleep(0.1)
        raise

When I run this, I will get:

round 13 [Error 5] : 'testingdir'
Traceback (most recent call last):
  File "E:\FHT360\FHT360_Mobile\Source\keywordranks\test.py", line 10, in <module>
    os.makedirs(somedir)
  File "C:\Users\yj\Anaconda\lib\os.py", line 157, in makedirs
    mkdir(name, mode)
WindowsError: [Error 5] : 'testingdir'

The round is different every time. So if I remove raise in the end, I will get something like this:

round 5 [Error 5] : 'testingdir'
round 67 [Error 5] : 'testingdir'
round 589 [Error 5] : 'testingdir'
round 875 [Error 5] : 'testingdir'

It simply fails randomly, with a small probability, ONLY in Windows. I tried this test script in cygwin and linux, this error never happens there. I also tried the same code in another Windows machine and it occurs there.

What are possible reasons for this?

[Update] Screenshot of proof [管理员 means Administrator in Chinese]: enter image description here

Also proof that the test case still fails in an administrator command prompt:

enter image description here

@pss said that he couldn't reproduce the issue. I tried our Windows 7 Server. I installed a fresh new python 2.7.10 64-bit. I had to set a really large upper bound for round and only started to see errors appearing after round 19963:

enter image description here

foresightyj
  • 2,006
  • 2
  • 26
  • 40
  • I don't see a question - but this does sound like an excellent bug report. Maybe you should post this at the [issue tracker for scrapy](https://github.com/scrapy/scrapy/issues). – Burhan Khalid Aug 27 '15 at 07:37
  • @BurhanKhalid I just made it a question by adding one line in the end. It is not a scrapy-related question so I'd not bother them at the moment. – foresightyj Aug 27 '15 at 07:39
  • 2
    I'd try to turn off antivirus temporarily. Most Windows boxes have one and they're mostly crappy. – Tometzky Aug 27 '15 at 07:48
  • Are you able to execute other commands in Windows CMD that require admin rights? – ρss Aug 27 '15 at 07:51
  • I don't know which commands must require admin rights. I tried `msconfig` and `shutdown -s -t 20000` and they both work. I don't even have to use a `admin command prompt`, a normal command prompt will also do. – foresightyj Aug 27 '15 at 07:58
  • @Tometzky I also tried to turn off my antivirus. The error still occurs. – foresightyj Aug 27 '15 at 08:00
  • 1
    Try `arp -d`. Might cause a little network downtime. https://technet.microsoft.com/en-us/library/cc758357%28v=ws.10%29.aspx – ρss Aug 27 '15 at 08:08
  • 1
    @ρss, That one indeed requires administrator rights. I attached screenshots of my trial result at the end of my question. – foresightyj Aug 27 '15 at 08:16
  • I can't reproduce this using your script (corrected with 2to3) on [WinPython](http://winpython.github.io/) 3.4.3.5 64bit on Windows 7 Enterprise as limited user on local drive and on a remote share. Runs with no error. Maybe bad Python distribution. – Tometzky Aug 27 '15 at 08:23
  • @tometzky, Can you try with more rounds, like 10000 rounds? I don't have this problem with 100 rounds. – foresightyj Aug 27 '15 at 08:24
  • Atleast this proovs that its not an admin rights issue. +1 now your questions looks complete and clear. – ρss Aug 27 '15 at 08:26
  • 1
    Strange - I'm sometimes getting another error `OSError: [WinError 145] Directory is not empty: 'testingdir'` on random high round (for example 3689 or 1488). I did not disable antivirus though. – Tometzky Aug 27 '15 at 08:37
  • @Tometzky, that is expected. See the doc for `os.makedirs`, which states **If exist_ok is False (the default), an OSError is raised if the target directory already exists.**. In my case, windows doesn't see the directory exists. Note also it also calls `os.mkdir` internally in this case, thus a different exception. – foresightyj Aug 27 '15 at 08:40
  • @Tometzky, Oh, sorry. I just realized you were still talking about my test script. Hmm, interesting... – foresightyj Aug 27 '15 at 08:41
  • 2
    Might be [related to file indexing service](http://stackoverflow.com/a/3764322/15862). Windows is _different_. – Tometzky Aug 27 '15 at 08:43
  • @Tometzky. Right on! I will try if tempfile and tempdir has this problem too. Thanks for pointing me into the right direction! – foresightyj Aug 27 '15 at 08:51
  • You might have more luck implementing persistent queue using SQLite instead of queuelib. – Tometzky Aug 27 '15 at 09:09
  • @Tometzky, Great suggestion! Sqlite will probably be more robust than plain files. I will consider doing that, in future. For the moment, I'd probably add a few more try-except clauses to make it retry up to 3 or 5 times...Meanwhile, I am going to post this concern to the scrapy team. – foresightyj Aug 27 '15 at 09:17
  • 1
    I can't reproduce this in 64-bit Windows 7, even after letting it loop over 100,000 times. Try printing the last [`NTSTATUS` code](https://msdn.microsoft.com/en-us/library/cc704588). Initial setup: `import ctypes;` `ntdll = ctypes.WinDLL('ntdll');` `ntdll.RtlGetLastNtStatus.restype = ctypes.c_uint`. Then in the `except` block: `s = hex(ntdll.RtlGetLastNtStatus());` `print 'round', i, e, s;` `time.sleep(0.1)`. – Eryk Sun Aug 27 '15 at 10:19
  • @eryksun, It is a small probability. As in the post quoted by Tometzky, the guy mentioned that a very windows services will try to read the file you created. So it might happen that you don't have those services or your computer is super fast and those services finish processing it really fast so you don't experience the same problem. – foresightyj Aug 28 '15 at 02:23
  • 1
    @foresightyj, yes I'm aware of the potential problems with indexing and virus checking services that keep a handle opened with `FILE_SHARE_DELETE`. You're in luck if it shares delete access. That means you can rename the directory to a random name using `os.rename`, and then call `shutil.rmtree` on the renamed directory. It stays in the filesystem until all open handles are closed (this shouldn't take a long time), but it won't interfere with creating a new directory using the original name. I'm still interested to know what the `NTSTATUS` code is in case you situation is different. – Eryk Sun Aug 28 '15 at 02:39
  • @eryksun, I tried it. it prints **round 875 [Error 5] : 'testingdir' 0xc0000056L**. which is `STATUS_DELETE_PENDING`. Thanks for the suggestions. – foresightyj Aug 28 '15 at 02:44
  • 2
    Thank you. `STATUS_DELETE_PENDING` is the expected code for the above scenario. I wish Windows set the error in this case to [`ERROR_DELETE_PENDING`](https://msdn.microsoft.com/en-us/library/ms681382#ERROR_DELETE_PENDING) (303) instead of the generic error `ERROR_ACCESS_DENIED`. – Eryk Sun Aug 28 '15 at 02:48
  • @eryksun. I also tried renaming `somedir` to a `uuid`-generated new directory name and then deleting that new directory name instead. No exceptions. However, it will make the entire loop 2 times slower. I totally agree. [Error 5] is so confusing. – foresightyj Aug 28 '15 at 02:57
  • I'm just adding it here because I was doing some similar testing on the timestamps of rapidly deleted/created files which worked perfectly on everything except NTFS which introduced me to [File System Tunneling](https://support.microsoft.com/en-us/help/172190/windows-nt-contains-file-system-tunneling-capabilities) ([better explanation](https://blogs.msdn.microsoft.com/oldnewthing/20050715-14/?p=34923)) which confused me for a while. It's worth knowing about. – Samuel Harmer Dec 07 '17 at 09:51

1 Answers1

2

Short: disable any antivirus or document indexing or at least configure them not to scan your working directory.

Long: you can spend months trying to fix this kind of problem, so far the only workaround that does not involve disabling the antivirus is to assume that you will not be able to remove all files or directories.

Assume this in your code and try to use a different root subdirectory when the service starts and trying to clean-up the older ones, ignoring the removal failures.

sorin
  • 161,544
  • 178
  • 535
  • 806