0

Why this question?

As a noob to programming with windows services, myself and a colleague were unable to get our service to start via the use of the net start command. We eventually found a problem that allowed the service to start, but I lack understanding about why our solution worked, I believe others may suffer this problem too, so I am posting the question and an answer which covers how we resolved our situation, I hope others more experienced and knowledgable will contribute more complete answers.

The Question I really want answered

Why it is that a windows service, specifically one based on python, will fail to start, but not return an error.

The most difficult part of this problem we had to overcome was not having an error returned, we looked in the events log, the command prompt where the net start command was used, but there was no error information to be had.

Problem Details

The problem occurred whilst attempting to create a windows service based on a python module, we followed the method provided here to install a windows service that uses our python module.

Starting the service like this worked (more on running here)

python service_definition.py debug

The code in service_definintion.py is from the examples in the links above, repeated below

import time
import random
from pathlib import Path
from SMWinservice import SMWinservice
import cress
from cress import server_interface
import sys 

class server_interface_backgndproc(SMWinservice):
    _svc_name_ = "server_interface_backgndproc"
    _svc_display_name_ = "CrowdRender SIP"
    _svc_description_ = "CrowdRender Server Interface Process"

    def start(self):
        self.isrunning = True
        sys.argv.append(["--",
                    # "--cr_session_type", 'agent',
                    "--cr_cress_mode", "SIP",
                    ])


    def stop(self):
        self.isrunning = False

    def main(self):
        i = 0
        ## while self.isrunning:
        server_interface.CRMachineManager()
       returns
        #while self.isrunning:


if __name__ == '__main__':
    server_interface_backgndproc.parse_command_line()

When it came time to start the service using either net start 'service name' or using the services panel in task manager, the service did not start. When using net start we got the following output;

net start 'service name'

Windows could not start the 'service name' service on Local Computer. 
The service did not return an error. 
This could be an internal Windows error or an internal service error. 
If the problem persists, contact your system administrator.

jeducious
  • 25
  • 5

1 Answers1

0

To be clear, I am answering my question with the specifics of what caused it and how we remedied it, this may not be the best answer as it focuses on a specific cause and doesn't elaborate on the mechanisms involved in reporting errors (or not) from windows services, or the best method to debug them.

What we found

Contrary to the output of the net start 'service name' command, there was in fact some error, but this did not get reported (or returned).

We solved the problem in our code (and a good rule of thumb in my book is that most of the time, the bad behaviour of your code is your own fault, not someone else's xD) by good old logging statements put into the code to show how far into our program we got until the logging stopped. This was a real issue, we had no way of telling where in our code base the problem was. Luckily it was in the first 100 lines or so.

Here's the part of the code we found that failed.

#Setup signal handling
signal.signal(signal.SIGTERM, self.handle_signal)

Once we removed the registering of the signal handler for SIGTERM, the service worked perfectly using the net start command.

Also in another module we found the same issue with

atexit.register(self.shutdown)

The handlers for both of these situations close network sockets and destroy instances of python's logging module. Removing the calls to register these handlers in both cases fixed our problem and allowed the services to run.

So in our case we have a solution to our problem, but not an answer to the main question of why this happened. We can only presume that either windows throws an error because it does not support our module registering its own signal handlers, or there is a problem with the functions called in our handlers.

The second seems unlikely since these handlers have worked in production for years. The first option seems plausible since potentially a signal handler can block an attempt to stop a service by never returning, perhaps MS doesn't like that. Not sure.

In summary, if you are writing a service in windows and using python, beware using signal handling as it caused our code to fail with no clue as to where the problem was and resulted in downtime while we bisected our code to find where the issue was.

I hope others with more insight as to the internal mechanics of the windows services system will comment or answer the main question as to why services fail to start and do not return an error.

jeducious
  • 25
  • 5