Questions tagged [streamparse]

Streamparse is a software package that runs Python code inside the Java-based Apache Storm.

Quoting the streamparse github project:

Streamparse lets you run Python code against real-time streams of data via Apache Storm. With streamparse you can create Storm bolts and spouts in Python without having to write a single line of Java. It also provides handy CLI utilities for managing Storm clusters and projects.

The Storm/streamparse combo can be viewed as a more robust alternative to Python worker-and-queue systems, as might be built atop frameworks like Celery and RQ. It offers a way to do "real-time map/reduce style computation" against live streams of data. It can also be a powerful way to scale long-running, highly parallel Python processes in production.

16 questions
9
votes
2 answers

Streamparse wordcount example

I have been wanting to use Apache Storm to stream from Kafka. I am more comfortable with Python, so I decided to use streamparse (https://github.com/Parsely/streamparse). The word count example is the introductory example. I have been trying to get…
red_devil
  • 1,009
  • 2
  • 13
  • 23
7
votes
1 answer

Submitting offsets to kafka after storm batch

What would be the correct way to submit only the highest offset of every partion when batch bolt finishes proccessing a batch? My main concern is machines dying while proccessing batches as the whole shebang is going to run in AWS spot instances. I…
SimSimY
  • 3,616
  • 2
  • 30
  • 35
4
votes
0 answers

Apache storm stream parse in Windows

I'm a newbie in apache storm. I'm trying to run apache storm + stream parse in windows 10. so I just tried to do in following. (http://streamparse.readthedocs.io/en/master/quickstart.html) First, Install Python 3.5 and JDK 1.8.0_131. Secod,…
4
votes
0 answers

Python Celery and Apache Storm comparison

The requirements are distributed task processing and programming tasks in Python for a high message rate. How do Celery and Storm (with streamparse) compare on the following tenets: Scalability- not only in terms of workers, but also in context of…
Confused
  • 617
  • 1
  • 9
  • 17
4
votes
0 answers

Streamparse/Python - custom fail() method not working for error tuples

I'm using Storm to process messages off of Kafka in real-time and using streamparse to build my topology. For this use case, it's imperative that we have 100% guarantee that any message into Storm is processed and ack'd. I have implemented logic on…
ctpaquette
  • 118
  • 1
  • 1
  • 7
2
votes
0 answers

Spout prematurely acks, even failed Bolt tuples

I'm using the Python Storm library streamparse (which utilizes pystorm underneath). I've had problems calling a Spouts fail() method in the boilerplate wordcount project. According to the pystorm quickstart docs and numerous things I've read,…
jgujgu
  • 51
  • 1
  • 2
  • 7
1
vote
1 answer

Upgrading topology from Python 2 to 3

I had a Streamparse topology that was originally developed using Python 2. I am now trying to upgrade it to Python 3 using the 2to3 tool. I have also upgraded Streamparse to 3.15.1 (not sure which version the topology was originally developed…
K G
  • 1,715
  • 6
  • 21
  • 29
1
vote
2 answers

StreamParse: IOError: Local port: 6627 already in use, unable to open ssh tunnel to nimbus.server.local:6627

Setup: Storm 0.10.0 Streamparse 2.1.4 Centos 6.5 Python 2.7 (Streamparse needs it) (Yes i know they are outdated, however i couldnt get anything working with Storm 1.0, its just broken with streamparse 3) When I attempt to launch a "streamparse…
Adam Bradbury
  • 93
  • 2
  • 9
1
vote
0 answers

storm streamparse spout "Complete latency" always at 0

I use streamparse for a while now but i'm stuck on one subject. We use storm-0.10.0 and streamparse==2.1.4. We let all the default value (no auto_anchor = False or something like that ) . We have no ack or fail method implemented in spout and we…
Bruno
  • 11
  • 3
1
vote
0 answers

attempt to call unbound fn

In my case, streamparse api were used to run locally and submit code to STORM cluster, when I ran it locally it was ok but when it is submitted to STORM cluster, I got java.lang.RuntimeException: java.lang.IllegalStateException: Attempting to call…
1
vote
0 answers

Storm UI with streamparse

I am working on an streamparse project on an AWS instance with ZooKeeper and Nimbus installed. I want to use the Storm UI. I ran sparse submit with the following config.json file: { "library": "", "topology_specs": "topologies/", …
dev
  • 2,474
  • 7
  • 29
  • 47
0
votes
1 answer

how to troubleshoot apache storm worker crash

I have a python code running (via streamparse) on Apache Storm 1.1.1, and recently notice the Storm worker keep crashing. Below is what I found from the worker log. I run out of ideas what can be the culprit, as the log doesn't give me enough clue.…
z11373
  • 1
0
votes
0 answers

Trying to submit streamparse wordcount example on a local storm docker container

I'm trying to use streamparse on a local cluster with docker containers Here's my docker-compose.yml version: '3' services: zookeeper: image: zookeeper container_name: zookeeper nimbus: image: storm:1.0.2 container_name:…
Ftagn
  • 195
  • 3
  • 19
0
votes
1 answer

Submit topology to storm cluster through streamparse

I am trying to use streamparse to develop and submit the topologies to the storm cluster. Since streamparse has its default wordcount topology to help user test the cluster, most of the tutorials I could find online is about submitting this default…
Lingbo
  • 31
  • 1
  • 7
0
votes
2 answers

How do I install streamparse from source?

I need to use streamparse on a CentOS machine that does not have internet access, meaning I cannot use pip. The only net-enabled services I can use are scp and ssh. My plan is to get streamparse on my local machine (Ubuntu) and then scp the…
offwhitelotus
  • 1,049
  • 9
  • 15
1
2