Using Amazon Redshift Spectrum, you can query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. Redshift Spectrum queries employ massive parallelism to execute very fast against large datasets.Multiple clusters can concurrently query the same dataset in Amazon S3 without the need to make copies of the data for each cluster.
Questions tagged [amazon-redshift-spectrum]
291 questions
31
votes
5 answers
Athena vs Redshift Spectrum
I am kind of evaluating Athena & Redshift Spectrum. Both serve the same purpose, Spectrum needs a Redshift cluster in place whereas Athena is pure serverless. Athena uses Presto and Spectrum uses its Redshift's engine
Are there any specific…

Mukund
- 916
- 2
- 11
- 18
24
votes
5 answers
AWS Glue: How to handle nested JSON with varying schemas
Objective:
We're hoping to use the AWS Glue Data Catalog to create a single table for JSON data residing in an S3 bucket, which we would then query and parse via Redshift Spectrum.
Background:
The JSON data is from DynamoDB Streams and is deeply…

ehelander
- 243
- 1
- 2
- 5
14
votes
2 answers
Redshift Spectrum: Automatically partition tables by date/folder
We currently generate a daily CSV export that we upload to an S3 bucket, into the following structure:
|--reportDate-
|-- part0.csv.gz
|-- part1.csv.gz
We want to be able to run reports partitioned by daily…

GoatInTheMachine
- 3,583
- 3
- 25
- 35
12
votes
5 answers
How to escape single quotes in Unload
conn_string = "dbname='{}' port='{}' user='{}' password='{}' host='{}'"\
.format(dbname,port,user,password,host_url)
sql="""UNLOAD ('select col1,col2 from %s.visitation_hourly_summary_us where col4= '2018-07-10' and col5=…

Mukesh Marimuthu
- 135
- 1
- 7
11
votes
5 answers
Offloading data files from Amazon Redshift to Amazon S3 in Parquet format
I would like to unload data files from Amazon Redshift to Amazon S3 in Apache Parquet format inorder to query the files on S3 using Redshift Spectrum. I have explored every where but I couldn't find anything about how to offload the files from…

Teja
- 13,214
- 36
- 93
- 155
10
votes
1 answer
Use external table redshift spectrum defined in glue data catalog
I have a table defined in Glue data catalog that I can query using Athena. As there is some data in the table that I want to use with other Redshift tables, can I access the table defined in Glue data catalog?
What will be the create external table…

Abhay Dubey
- 549
- 2
- 7
- 18
8
votes
2 answers
Load Parquet files into Redshift
I have a bunch of Parquet files on S3, i want to load them into redshift in most optimal way.
Each file is split into multiple chunks......what is the most optimal way to load data from S3 into Redshift?
Also, how do you create the target table…

Richard
- 381
- 2
- 4
- 22
7
votes
1 answer
Date fields transformation from AWS Glue table to RedShift Spectrum external table
I am trying to transform the JSON dataset from S3 to Glue table schema into an Redshift spectrum for data analysis. While creating external tables, how to transform the DATE fields?
Need to highlight the source data is coming from MongoDB in ISODate…

SunSmiles
- 186
- 9
6
votes
2 answers
Does Amazon Redshift have its own storage backend
I'm new to Redshift and having some clarification on how Redshift operates:
Does Amazon Redshift has their own backend storage platform or it depends on S3 to store the data as objects and Redshift is used only for querying, processing and…

Durgaprasad
- 159
- 2
- 9
6
votes
3 answers
Is there a way to describe an external/spectrum table via redshift?
In AWS Athena you can write
SHOW CREATE TABLE my_table_name;
and see a SQL-like query that describes how to build the table's schema. It works for tables whose schema are defined in AWS Glue. This is very useful for creating tables in a regular…

New Alexandria
- 6,951
- 4
- 57
- 77
6
votes
0 answers
Serde serialization lib is null when the glue crawler crawls redshift table
I tried to create a glue crawler which crawls a redshift table.The glue crawler executes successfully and creates an external table.But when I look at the metadata of the table I found "Input format","Output format","Serde name" and "Serde…

trp86
- 414
- 1
- 7
- 21
6
votes
2 answers
Trouble Partitioning my Amazon Spectrum Table
Getting this error in particular:
ERROR: Error when calling external catalog API: The number of partition keys do not match the number of partition values

Aviv Goldgeier
- 799
- 7
- 23
6
votes
1 answer
What are the steps to use Redshift Spectrum.?
Currently I am using Amazon Redshift as well as Amazon S3 to store data. Now I want to use Spectrum to improve performance but confused in how to use it properly.
If I am using SQL workbench can I create external schema from same or I need to create…

Pratik Rawlekar
- 327
- 4
- 14
5
votes
1 answer
How Redshift Spectrum scans data?
Given a data-source of 1.4 TB of Parquet data on S3 partitioned by a timestamp field (so partitions are year - month - day) I am querying a specific day of data (2.6 GB of data) and retrieving all available fields in the Parquet files via Redshift…

Vzzarr
- 4,600
- 2
- 43
- 80
5
votes
0 answers
Quote escaped quotes in Redshift external tables
I'm trying to create an external table in Redshift from a csv that has quote escaped quotes in it, as documented in rfc4180:
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it…

Tom Rea
- 51
- 1
- 2