Questions tagged [streamsets]

Use the streamsets tag for questions regarding StreamSets DataOps Platform which includes Data Collector, Transformer and Control Hub.

StreamSets DataOps Platform empowers your whole team, from highly skilled data engineers to visual ETL developers, to do powerful data engineering work. Only StreamSets makes it both simple to get started building pipelines quickly with intent-driven design and easy to extend to meet complex enterprise needs.

Useful Resources:

Initial Release: June 27th, 2014 - StreamSets Data Collector – the First Four Years

Latest Production Release Series:

EBooks:

183 questions
43
votes
4 answers

Difference between Apache NiFi and StreamSets

I am planning to do a class project and was going through few technologies where I can automate or set the flow of data between systems and found that there are couple of them i.e. Apache NiFi and StreamSets ( to my knowledge ). What I couldn't…
Goutam
  • 1,337
  • 8
  • 22
  • 41
10
votes
3 answers

Kafka vs StreamSets

I was reading articles related to Kafka and StreamSets and my understanding was Kafka acts as a broker between Producer system and subscriber. Producer push the data into Kafka cluster, subscriber pull the data from Kafka StreamsSets is a…
NikRED
  • 1,175
  • 2
  • 21
  • 39
4
votes
0 answers

Unable to query for default vendor from RPM: Error while executing process. while running streamsets

I just followed the following tutorials for latest streamset 2.6.6 with flume as datacollector, https://github.com/streamsets/datacollector/blob/master/BUILD.md At the time of making the build i have faced the following error: [ERROR] Failed to…
sathya
  • 1,982
  • 1
  • 20
  • 37
3
votes
2 answers

Streamsets: SpoolDIR_01 Failed to process file

Hi I'm trying to run a pipeline to process a very large file (about 4milion records). Everytime it reaches to around 270, 000 it fails and then stops processing anymore records and returns this error. '/FileLocation/FiLeNAME..DAT' at position…
MichelleNZ
  • 45
  • 3
3
votes
1 answer

Appending UUID in file name when streaming via StreamSets Data Collector

I am using HttpClient origin to stream a file from an HTTP url to Hadoop destination, but the file name in the destination is appended with some random uuid. I want the file name to be as it is from the source. Example: source file name is…
3
votes
1 answer

What is the StreamSets architecture?

I am not very clear about the architecture even after going through tutorials. How do we scale streamset in a distributed environment? Let's say, our input data velocity increases from origin then how to ensure that SDC doesn't give performance…
Aman Raturi
  • 99
  • 1
  • 8
3
votes
1 answer

Connecting Spark streaming to streamsets input

I was wondering if it would be possible to provide input to spark streaming from StreamSets. I noticed that Spark streaming is not supported within the StreamSets connectors destination https://streamsets.com/connectors/ . I exploring if there are…
pjesudhas
  • 399
  • 4
  • 13
3
votes
1 answer

Streamsets solrcloud on CDH 5.7 unable to connect to Solr

I am using streamsets on CDH version 5.7.0 A sample workflow to load a file from HDFS(origin) and create records on Solr (destination). It is failing on validation - SOLR_03 - Could not connect to the Solr instance:…
user2023507
  • 1,153
  • 12
  • 23
2
votes
0 answers

Inverted exclamation mark gets added in the output written by Google cloud storage

Thank you for the support in advance. Using streamset pipeline, I am trying to load the MSSQL CDC data using SQL Server CDC Client origin and load into destinations in Google cloud storage and local FS. While the local FS writes as expected, the…
Hari
  • 441
  • 6
  • 15
2
votes
0 answers

StreamSet CEF parsing issus

We send messages from ArcSight to StreamSets pipeline using Kafka. We are experiencing trouble parsing the messages from Kafka in the pipeline. The data sent from ArcSight is sometimes partitioned into chunks which means that a huge script will be…
gabi939
  • 107
  • 2
  • 8
2
votes
3 answers

How to perform elasticsearch lookup in streamsets

I am accepting two kinds of records A and B in Streamsets v3.21 - there is a common field called correlationid common between the parent type A and multiple child type B. Type A always arrives first. Type A and Type B get written to separate…
bigbadmouse
  • 216
  • 1
  • 11
2
votes
1 answer

How to enable Streamset Mutitenancy using LDAP Authentication

I am using Streamset Data Collector version 3.19.1, currently am trying to integrate Streamset with LDAP server for authentication, I am successful with the integration however we are facing difficulties in configuring the roles and groups like the…
Pradeep M
  • 871
  • 1
  • 6
  • 9
2
votes
1 answer

No resources found in streamset-ns namespace

I need some assistance to know what I'm missing here.. I'm trying to deploy streamsets application from customized values.yml file for my Lab (localhost-master).I'm trying to deploy the pod on "streamset-ns" namespace. I'm facing the below…
2
votes
1 answer

Special characters handling with backticks/backquotes in StreamSets Data Collector

My source fields have special characters and need to be enclosed with backticks. Ex: Source - ahj@# Target - ` ahj@# ` How do I implement this in StreamSets - enclosing the column names?
Venkat
  • 31
  • 1
2
votes
1 answer

How to add Custom Processor to StreamSets

I have a StreamSets container in docker compose and jar file, which are created according with tutorial -…
Vadim
  • 753
  • 8
  • 22
1
2 3
12 13