0

I have a python code running (via streamparse) on Apache Storm 1.1.1, and recently notice the Storm worker keep crashing. Below is what I found from the worker log. I run out of ideas what can be the culprit, as the log doesn't give me enough clue. The topology worked fine before. Any idea where else I can start looking into?

2019-08-28 15:05:32.947 o.a.s.s.ShellSpout Thread-11-event_spout-executor[10 10] [INFO] Launched subprocess with pid 10054
2019-08-28 15:05:32.951 o.a.s.d.executor Thread-11-event_spout-executor[10 10] [INFO] Opened spout event_spout:(10)
2019-08-28 15:05:32.953 o.a.s.d.executor Thread-11-event_spout-executor[10 10] [INFO] Activating spout event_spout:(10)
2019-08-28 15:05:32.953 o.a.s.s.ShellSpout Thread-11-event_spout-executor[10 10] [INFO] Start checking heartbeat...
2019-08-28 15:05:32.961 o.a.s.util Thread-11-event_spout-executor[10 10] [ERROR] Async loop died!
java.lang.RuntimeException: pid:10054, name:event_spout exitCode:-1, errorString:
        at org.apache.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:218) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.spout.ShellSpout.sendSyncCommand(ShellSpout.java:145) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.spout.ShellSpout.activate(ShellSpout.java:266) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.daemon.executor$fn__4962$fn__4977$fn__5008.invoke(executor.clj:641) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:484) [storm-core-1.1.1.jar:1.1.1]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read.
Serializer Exception:
        at org.apache.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:127) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:183) ~[storm-core-1.1.1.jar:1.1.1]
        ... 6 more
2019-08-28 15:05:32.968 o.a.s.d.executor Thread-11-event_spout-executor[10 10] [ERROR]
java.lang.RuntimeException: pid:10054, name:event_spout exitCode:-1, errorString:
        at org.apache.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:218) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.spout.ShellSpout.sendSyncCommand(ShellSpout.java:145) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.spout.ShellSpout.activate(ShellSpout.java:266) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.daemon.executor$fn__4962$fn__4977$fn__5008.invoke(executor.clj:641) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:484) [storm-core-1.1.1.jar:1.1.1]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read.
Serializer Exception:
        at org.apache.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:127) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:183) ~[storm-core-1.1.1.jar:1.1.1]
        ... 6 more
2019-08-28 15:05:33.009 o.a.s.util Thread-11-event_spout-executor[10 10] [ERROR] Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
        at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341) [storm-core-1.1.1.jar:1.1.1]
        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.7.0.jar:?]
        at org.apache.storm.daemon.worker$fn__5632$fn__5633.invoke(worker.clj:763) [storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.daemon.executor$mk_executor_data$fn__4848$fn__4849.invoke(executor.clj:276) [storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:494) [storm-core-1.1.1.jar:1.1.1]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
2019-08-28 15:05:33.018 o.a.s.d.worker Thread-16 [INFO] Shutting down worker tmon-4-1567019114 ba5b3695-b390-4c3e-9d92-af0771f17b86 6700
Marcel Gosselin
  • 4,610
  • 2
  • 31
  • 54
z11373
  • 1

1 Answers1

0

Whenever I see a Serializer Exception in external process bolts (e.g. Python bolts), I suspect the external process is printing something to the stdout stream.

Storm utilizes stdin/stdout of bolt processes to do its own communication, any logging in Python bolts should be done to stderr or to a file.

Re'em
  • 230
  • 2
  • 11
  • Thanks @Re'em! I have looked at our Python code, and I don't see any print statement there. However, you're right, earlier I thought it's network issue, but then I scaled down to one node and still see the error, I also think something wrong with the JSON serialization between those processes as you said. I looked at streamparse log and storm worker log, and yet still not able to figure out the culprit. For extra info, it worked fine earlier, then I deployed my change and start seein the error, but even after revert it back, it still gave same error, so likely not due to my change. – z11373 Aug 29 '19 at 17:56