Questions tagged [data-integration]

Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution encompasses discovery, cleansing, monitoring, transforming and delivery of data from a variety of sources.

Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution encompasses discovery, cleansing, monitoring, transforming and delivery of data from a variety of sources.

It is a huge topic for the IT because, ultimately aims to make all systems work seamlessly together.

Example with data warehouse

The process must take place among the organization's primary transaction systems before data arrives at the data warehouse.
It is rarely complete, unless the organization has a comprehensive and centralized master data management(MDM) system.

Data integration usually takes the form of conforming dimensions and facts in the data warehouse. This means establishing common dimensional attributes across separated databases. Conforming facts is making agreement on common business metrics such as key performance indicators (KPIs) across separated databases, so these numbers can be compared mathematically.

332 questions
112
votes
7 answers

Apache Kafka vs Apache Storm

Apache Kafka: Distributed messaging system Apache Storm: Real Time Message Processing How we can use both technologies in a real-time data pipeline for processing event data? In terms of real time data pipeline both seems to me do the job…
Ananth Duari
  • 2,859
  • 11
  • 35
  • 42
20
votes
10 answers

How do you manage databases during development?

My development team of four people has been facing this issue for some time now: Sometimes we need to be working off the same set of data. So while we develop on our local computers, the dev database is connected to remotely. However, sometimes we…
user94154
  • 16,176
  • 20
  • 77
  • 116
9
votes
2 answers

Use JSON Input step to process uneven data

I'm trying to process the following with an JSON Input step: {"address":[ {"AddressId":"1_1","Street":"A Street"}, {"AddressId":"1_101","Street":"Another Street"}, {"AddressId":"1_102","Street":"One more street", "Locality":"Buenos Aires"}, …
rsilva4
  • 1,915
  • 1
  • 23
  • 39
8
votes
3 answers

Are there any ETL tools that integrate with Rails models?

I'm researching ETL tools to import flat files into a database and subsequently export xml files. Many of the tools support generating code to use in your application; however, I haven't found any that support using code already in your…
Kyle West
  • 8,934
  • 13
  • 65
  • 97
6
votes
3 answers

Send Email with pentaho PDI

I want to sent an email using PDI. I created an job and added the 'Mail' element. There are my parameters. Server smtp.gmail.com Port: 587 Use Authentication User : mygmailusername Pass : mygmailpass Secure Con Type : TLS When I run the job I get…
flexxxit
  • 2,440
  • 5
  • 42
  • 69
5
votes
1 answer

Data Integration

I have been looking at data integration methods Global as view and Local as view, but I can not find any examples of how queries would be formed for these, could anyone give me examples of how these methods of data integration can be queried using…
AlanFoster
  • 8,156
  • 5
  • 35
  • 52
5
votes
1 answer

Apache Nifi/Cassandra - how to load CSV into Cassandra table

I have various CSV files incoming several times per day, storing timeseries data from sensors, which are parts of sensors stations. Each CSV is named after the sensor station and sensor id from which it is coming from, for instance…
Piar
  • 93
  • 1
  • 8
5
votes
1 answer

Using Kafka for Data Integration with Updates & Deletes

So a little background - we have a large number of data sources ranging from RDBMS's to S3 files. We would like to synchronize and integrate this data with other various data warehouses, databases, etc. At first, this seemed like the canonical model…
archeezee
  • 411
  • 1
  • 4
  • 17
5
votes
1 answer

Expose Talend ETL Job as a Web Service

I am currently evaluating Talend ETL (Talend Open Studio for Data Integration). I would like to know how / if i can expose an ETL Job as a Web Service. I know i can export jobs as web services and invoke them through a specific URL however, my goal…
tpanagopoulos
  • 137
  • 2
  • 3
  • 11
4
votes
2 answers

How to use Pentaho Data Integration to copy columns between tables

I thought this would be an easy task, but since I am new to PDI, I could not find out so far which transform to choose to accomplish the following: I am using Pentaho Data Integration (former Kettle), Community Edition, to map/copy values from one…
juniper
  • 311
  • 5
  • 13
4
votes
1 answer

How do I bring data together from multiple databases?

BACKGROUND: I should preface this by saying I'm not trying to get someone to do my work for me. I feel like I'm at a bit of a crossroad where there are multiple ways to get to my goal, but I'm not sure which ones are 'standard' and/or if my…
Aaron Anodide
  • 16,906
  • 15
  • 62
  • 121
4
votes
0 answers

How to retrieve data from webservice with pagination using pentaho data-integration tool?

I am attempting to use the rest client to query a webservice for data. The flow is as follows: POST request with the query that returns a cursor Id (Get Initial Cursor) GET request with the cursor ID to retrieve the first batch of 5000 rows Along…
Shoan
  • 4,003
  • 1
  • 26
  • 29
4
votes
1 answer

How to handle Slowly Changing Dimension in Amazon Redshift using Pentaho?

Since Amazon Redshift is optimized for reading instead of writing, how can I manage a Slowly Changing Dimension procedure using an ETL tool, in my case Pentaho Data Integration? As the ETL tool would do updates/inserts (Dimension Lookup/Update) line…
Lucas Rezende
  • 564
  • 1
  • 7
  • 18
4
votes
1 answer

Pentaho Data Integration User Defined Java Class

I create simple java class and export it to jar: package test; public class Test { public Test() { // TODO Auto-generated constructor stub } } Jar file add to lib folder in Pentaho (there are many jar files) Next step I want to…
Michał Orliński
  • 1,308
  • 13
  • 15
4
votes
1 answer

Reusing transformations with different data in Pentaho data integration Kettle

I'm working with Pentaho Kettle (PDI) and i'm trying to manage a flow in where there are a few transformations which should work like those where functions. I'll be more specific. I've created some transformation that make some modify on a few…
giogix
  • 769
  • 1
  • 12
  • 32
1
2 3
22 23