Questions tagged [etl]

ETL is an acronym for Extract, Transform, and Load. It refers to a process of extracting data from source systems, transforming the data in some way (manipulating it, filtering it, combining it with other sources), and finally loading the transformed data to target system(s).

ETL is a process prominent especially in data warehousing. The term is an industry standard, representing data movement and transformation. It performs the following functions:

Extract data from homogeneous or heterogeneous data sources

Transform data - cleansing, reformatting, standardization, aggregate, join with other data, and apply business logic

Load the data in specified target tables, systems, or flat files.

The ETL processes are reusable components that can be scheduled to perform data movement on particular time intervals. Used as batch processes, ETL jobs support massive parallel processing of data. Generally they are easy to maintain and scalable.

It is used for movement across systems when a high volume of data and complex business rules are involved.

Big Data and the ELT approach (yes, ELT)

Traditional data integration was performed using batch processing (data on the rest), while big data integration can be done in real-time or with batch processing. Which make the ETL phases reordered to become ELT in some cases, so the data is extracted, loaded into distributed file systems, and then transformed before being used.

Commonly used ETL tools are, for example:

  • SQL Server Integration Services (SSIS)
  • Informatica PowerCenter
  • IBM Infosphere DataStage
  • Syncsort DMX / DMX-h
  • Oracle Data Integrator
  • Oracle Warehouse Builder
  • SAS Data Management
  • CloverETL

Open-source solutions are available as well, such as

  • Pentaho Data Integration (Kettle)
  • Talend Open Studio.

References:

5837 questions
265
votes
13 answers

MySQL - Rows to Columns

I tried to search posts, but I only found solutions for SQL Server/Access. I need a solution in MySQL (5.X). I have a table (called history) with 3 columns: hostid, itemname, itemvalue. If I do a select (select * from history), it will return …
Bob Rivers
  • 5,261
  • 6
  • 47
  • 59
150
votes
5 answers

What does 'killed' mean when processing a huge CSV with Python, which suddenly stops?

I have a Python script that imports a large CSV file and then counts the number of occurrences of each word in the file, then exports the counts to another CSV file. But what is happening is that once that counting part is finished and the exporting…
user1893354
  • 5,778
  • 12
  • 46
  • 83
119
votes
11 answers

How to import load a .sql or .csv file into SQLite?

I need to dump a .sql or .csv file into SQLite (I'm using SQLite3 API). I've only found documentation for importing/loading tables, not entire databases. Right now, when I type: sqlite3prompt> .import FILENAME TABLE I get a syntax error, since…
happythenewsad
  • 1,765
  • 3
  • 15
  • 17
117
votes
6 answers

How to convert result table to JSON array in MySQL

I'd like to convert result table to JSON array in MySQL using preferably only plain MySQL commands. For example with query SELECT name, phone FROM person; | name | phone | | Jack | 12345 | | John | 23455 | the expected JSON output would be [ { …
ronkot
  • 5,915
  • 4
  • 27
  • 41
89
votes
3 answers

Error 0xc0202049: Data Flow Task 1: Failure inserting into the read-only column

I am using SQL Server 2008 import and export wizard. I need to import a database. I opened the SQL server import/export wizard and I went through the following actions:- for the destination I chose "SQL server native client 10". then I selected…
John John
  • 1
  • 72
  • 238
  • 501
64
votes
15 answers

Import Excel spreadsheet columns into SQL Server database

I have an Excel spreadsheet that I want to import select columns into my SQL Server 2008 database table. The wizard didn't offer that option. Do any easy code options exist?
user47206
  • 641
  • 1
  • 6
  • 3
52
votes
3 answers

sqlite insert into table select * from

I need to move data from one table to another in my Android app I would like to use the following sql: insert into MYTABLE2 select id, STATUS risposta, DATETIME('now') data_ins from MYTABLE 2 Unfortunately in table MYTABLE2 there is an _ID…
Gyonder
  • 3,674
  • 7
  • 32
  • 49
47
votes
3 answers

How to set dependencies between DAGs in Airflow?

I am using Airflow to schedule batch jobs. I have one DAG (A) that runs every night and another DAG (B) that runs once per month. B depends on A having completed successfully. However B takes a long time to run and so I would like to keep it in a…
Conor
  • 1,509
  • 2
  • 20
  • 28
44
votes
4 answers

SSIS Extension for Visual Studio 2022

I have downloaded and installed Visual Studio 2022. Then after click on modify Now, I want to create SSIS package, for this I have started VS22 and in "manage extensions" when I try to find Microsoft SSIS, I am unable to find. For this, I have…
38
votes
14 answers

convert Excel Date Serial Number to Regular Date

I got a column called DateOfBirth in my csv file with Excel Date Serial Number Date Example: 36464 37104 35412 When i formatted cells in excel these are converted as 36464 => 1/11/1999 37104 => 1/08/2001 35412 => 13/12/1996 I need…
Sreedhar
  • 29,307
  • 34
  • 118
  • 188
37
votes
3 answers

What is a staging table?

Are staging tables used only in Data warehouse project or in any SSIS Project? I would like to know what is a staging table? Can anyone give me some examples on how to use it and in what circumstances it is implemented? Also, may I please know the…
Suj
  • 416
  • 1
  • 5
  • 11
37
votes
3 answers

Which user credentials does Integration Services Catalog use to execute packages?

We have an SSIS package which reads text files from a shared network directory. When I execute this package in SSDT it works fine. When we however deploy the project to the Integration Services Catalog and attempt to run the same package from there…
JohnnyLaw
  • 373
  • 1
  • 3
  • 4
36
votes
6 answers

F# and "enterprise-level" reporting

Based on your actual experience, a whitepaper or other respected referenceable study, is F# currently a viable tool for corporate-/enterprise-level reporting? Attention: Before voting to close this question as "not constructive", please read the bit…
JDB
  • 25,172
  • 5
  • 72
  • 123
35
votes
7 answers

Why is my SSIS toolbox empty in Visual Studio 2019 community?

I installed Visual Studio 2019 Community and then installed data tools. I can open an Integration Services project but when I look at the SSIS Tooolbox, it's empty. How do I fix this?
JJ.
  • 9,580
  • 37
  • 116
  • 189
34
votes
10 answers

SSIS Data Flow Task hangs on excecution of Pre-excecute phase

I have a Data Flow Task that is hanging on excecution. The flow is simple, makes two queries to different tables (Both with a couple of joins), then sorts and merges the otuputs through a common id, adds a static column to all the records, saves…
Ryoku
  • 792
  • 1
  • 5
  • 18
1
2 3
99 100