2

I have a simple_one_for_one supervisor which has gen_fsm children. I want each gen_fsm child to send a message only on the last time it terminates. Is there any way to know when is the last cycle?

here's my supervisor:

-module(data_sup).

-behaviour(supervisor).

%% API
-export([start_link/0,create_bot/3]).

%% Supervisor callbacks
-export([init/1]).

%%-compile(export_all).


%%%===================================================================
%%% API functions
%%%===================================================================

start_link() ->
  supervisor:start_link({local, ?MODULE}, ?MODULE, []).

init([]) ->
 RestartStrategy = {simple_one_for_one, 0, 1},
 ChildSpec = {cs_fsm, {cs_fsm, start_link, []},
 permanent, 2000, worker, [cs_fsm]},
 Children = [ChildSpec],
 {ok, {RestartStrategy, Children}}.

create_bot(BotId, CNPJ,Pid) ->
  supervisor:start_child(?MODULE, [BotId, CNPJ, Pid]).

the Pid is the Pid of the process which starts the superviser and gives orders to start the children.

-module(cs_fsm).

-behaviour(gen_fsm).
-compile(export_all).

-define(SERVER, ?MODULE).
-define(TIMEOUT, 5000).

-record(params, {botId, cnpj, executionId, pid}).

%%%===================================================================
%%% API
%%%===================================================================

start_link(BotId, CNPJ, Pid) ->
  io:format("start_link...~n"),
  Params = #params{botId = BotId, cnpj = CNPJ, pid = Pid},
  gen_fsm:start_link(?MODULE, Params, []).


%%%===================================================================
%%% gen_fsm callbacks
%%%===================================================================

init(Params) ->
  io:format("initializing~n"),
  process_flag(trap_exit, true),
  {ok, requesting_execution, Params, 0}.

requesting_execution(timeout,Params) ->
  io:format("erqusting execution"),
  {next_state, finished, Params,?TIMEOUT}.

finished(timeout, Params) ->
  io:format("finished :)~n"),
  {stop, normal, Params}.

terminate(shutdown, _StateName, Params) ->
  Params#params.pid ! {terminated, self(),Params},
  ok;

terminate(_Reason, _StateName, Params) ->
  ok.

my point is that if the process fails in any of the states it should send a message only if it is the last time it is restarted by the supervisor (according to its restart strategy).

If the gen_fsm fails, does it restart from the same state with same state data? If not how can I cause it to happen?

ipinak
  • 5,739
  • 3
  • 23
  • 41
dina
  • 4,039
  • 6
  • 39
  • 67
  • OK, now after your further explanation I understand that it's not about the `gen_fsm` terminating because of some expected trigger but terminating abnormally because the process died? You just want to count when supervisor restarts `cs_fsm` and send the message when the supervisor isn't going to restart it any more in case it dies again? – Greg Feb 25 '16 at 13:29
  • yes. is there a way to do that? – dina Feb 25 '16 at 13:47
  • 1
    and to make my question more general: can a supervisor make a different terminate for its child the in last time it restarted it? – dina Feb 25 '16 at 13:54
  • How supervisor would know that it restarted the child for the last time? Until the child actually crashes supervisor can't calculate that it's the last crash, because it doesn't know if or when the child is even going to crash. Only after the crash supervisor can deduct if it's allowed to to restart the child once more or not. See my fuller answer in the edit. BTW why would you want to know that the child has been restarted for the last time? If supervisor can't restart the child because it crashed too many times it's an unrecoverable situation and the supervisor ought to die itself! – Greg Feb 25 '16 at 15:21
  • I want the child to notify the reason for crashing to a certain process but only in the last time it crashes. Maybe I should have a count in the state record (so I prevent using ets table) and I'll increment it in the terminate and then I can know how many times it ran. BTW when my gen_fsm child crashes does it get restarted on it's previous state? – dina Feb 25 '16 at 15:46
  • Answered your last question in another edit. In general it would be better to prevent the crash or handle it somehow inside the `gen_fsm` process (I provided two examples how to do that) rather than allowing `gen_fsm` to crash and trying to add some logic to handle that to the supervisor. – Greg Feb 25 '16 at 16:38

1 Answers1

3

You can add sending the message to the Module:terminate/3 function which is called when one of the StateName functions returns {stop,Reason,NewStateData} to indicate that the gen_fsm should be stopped.

gen_fsm is a finite state machine so you decide how it transitions between states. Something that triggers the last cycle may also set something in the StateData that is passed to Module:StateName/3 so that the function that handles the state knows it's the last cycle. It's hard to give a more specific answer unless you provide some code which we could analyze and comment on.

EDIT after further clarification:

Supervisor doesn't notify its children which time it has restarted them and it also can't notify the child that it's the last restart. This later is simply because it doesn't know that it's going to be the last until the supervisor process actually crashes once more, which the supervisor can't possibly predict. Only after the child crashed supervisor can calculate how many times the child crashed during a period of time and if it is allowed to restart the child once more or if that was the last restart and now it's time for the supervisor to die as well.

However, nothing is stopping the child from registering, e.g. in an ETS table, how many times it has been restarted. But it of course won't help with deducting which restart is the last one.

Edit 2:

When the supervisor restarts the child it starts it from scratch using the standard init function. Any previous state of the child before it crashed is lost.

Please note that a crash is an exceptional situation and it's not always possible to recover the state, because the crash could have corrupted the state. Instead of trying to recover the state or asking supervisor when it's done restarting the child, why not to prevent the crash from happening in the first place? You have two options:

I. Use try/catch to catch any exceptional situations and act accordingly. It's possible to catch any error that would otherwise crash the process and cause supervisor to restart it. You can add try/catch to any entry function inside the gen_fsm process so that any error condition is caught before it crashes the server. See example function 1 or example function 2:

read() ->
    try
        try_home() orelse try_path(?MAIN_CFG) orelse
            begin io:format("Some Error", []) end
    catch
        throw:Term -> {error, Term}
    end.

try_read(Path) ->
    try
        file:consult(Path)
    catch
        error:Error -> {error, Error}
    end.

II. Spawn a new process to handle the job and trap EXIT signals when the process dies. This allows gen_fsm to handle a job asynchronously and handle any errors in a custom way (not necessarily by restarting the process as a supervisor would do). This section titled Error Handling explains how to trap exit signals from child processes. And this is an example of trapping signals in a gen_server. Check the handle_info function that contains a few clauses to trap different types of EXIT messages from children processes.

init([Cfg, Id, Mode]) ->
    process_flag(trap_exit, true),
    (...)


handle_info({'EXIT', _Pid, normal}, State) ->
    {noreply, State};
handle_info({'EXIT', _Pid, noproc}, State) ->
    {noreply, State};
handle_info({'EXIT', Pid, Reason}, State) ->
    log_exit(Pid, Reason),
    check_done(error, Pid, State);
handle_info(_, State) ->
    {noreply, State}.
Greg
  • 8,230
  • 5
  • 38
  • 53
  • thanks!! I'm actually using a ets table in the child processes. about handling the errors I prefer the 'let it crash' method. – dina Feb 28 '16 at 12:24
  • "Let it crash philosophy" doesn't mean that you shouldn't catch errors when you can. It only means you shouldn't code defensively. It's better to use try-catch than trying to add more logic to handle failures. See this slide: https://qconlondon.com/london-2011/qconlondon.com/dl/qcon-london-2011/slides/SteveVinoski_LetItCrashExceptWhenYouShouldnt.pdf That's a general comment because I don't know the exact design of your architecture :) – Greg Feb 28 '16 at 15:01