Questions tagged [amazon-redshift-spectrum]

Using Amazon Redshift Spectrum, you can query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. Redshift Spectrum queries employ massive parallelism to execute very fast against large datasets.Multiple clusters can concurrently query the same dataset in Amazon S3 without the need to make copies of the data for each cluster.

291 questions

votes

5 answers

Athena vs Redshift Spectrum

I am kind of evaluating Athena & Redshift Spectrum. Both serve the same purpose, Spectrum needs a Redshift cluster in place whereas Athena is pure serverless. Athena uses Presto and Spectrum uses its Redshift's engine Are there any specific…

asked May 09 '18 at 09:38

Mukund

votes

5 answers

AWS Glue: How to handle nested JSON with varying schemas

Objective: We're hoping to use the AWS Glue Data Catalog to create a single table for JSON data residing in an S3 bucket, which we would then query and parse via Redshift Spectrum. Background: The JSON data is from DynamoDB Streams and is deeply…

amazon-redshift aws-glue amazon-dynamodb-streams amazon-redshift-spectrum

asked Mar 23 '18 at 21:09

ehelander

votes

2 answers

Redshift Spectrum: Automatically partition tables by date/folder

We currently generate a daily CSV export that we upload to an S3 bucket, into the following structure: |--reportDate- |-- part0.csv.gz |-- part1.csv.gz We want to be able to run reports partitioned by daily…

amazon-s3 amazon-redshift amazon-redshift-spectrum

asked Nov 08 '17 at 16:16

GoatInTheMachine

3,583
3
25
35

votes

5 answers

How to escape single quotes in Unload

conn_string = "dbname='{}' port='{}' user='{}' password='{}' host='{}'"\ .format(dbname,port,user,password,host_url) sql="""UNLOAD ('select col1,col2 from %s.visitation_hourly_summary_us where col4= '2018-07-10' and col5=…

python amazon-web-services amazon-s3 amazon-redshift-spectrum

asked Sep 25 '18 at 12:40

Mukesh Marimuthu

votes

5 answers

Offloading data files from Amazon Redshift to Amazon S3 in Parquet format

I would like to unload data files from Amazon Redshift to Amazon S3 in Apache Parquet format inorder to query the files on S3 using Redshift Spectrum. I have explored every where but I couldn't find anything about how to offload the files from…

amazon-redshift parquet amazon-athena amazon-redshift-spectrum

asked Feb 07 '18 at 21:36

Teja

13,214
36
93
155

votes

1 answer

Use external table redshift spectrum defined in glue data catalog

I have a table defined in Glue data catalog that I can query using Athena. As there is some data in the table that I want to use with other Redshift tables, can I access the table defined in Glue data catalog? What will be the create external table…

amazon-web-services amazon-redshift amazon-athena aws-glue amazon-redshift-spectrum

asked Jan 10 '18 at 06:23

Abhay Dubey

votes

2 answers

Load Parquet files into Redshift

I have a bunch of Parquet files on S3, i want to load them into redshift in most optimal way. Each file is split into multiple chunks......what is the most optimal way to load data from S3 into Redshift? Also, how do you create the target table…

amazon-web-services amazon-ec2 amazon-redshift parquet amazon-redshift-spectrum

asked Sep 05 '18 at 23:27

Richard

votes

1 answer

Date fields transformation from AWS Glue table to RedShift Spectrum external table

I am trying to transform the JSON dataset from S3 to Glue table schema into an Redshift spectrum for data analysis. While creating external tables, how to transform the DATE fields? Need to highlight the source data is coming from MongoDB in ISODate…

json aws-glue isodate amazon-redshift-spectrum

asked Mar 19 '19 at 21:59

SunSmiles

votes

2 answers

Does Amazon Redshift have its own storage backend

I'm new to Redshift and having some clarification on how Redshift operates: Does Amazon Redshift has their own backend storage platform or it depends on S3 to store the data as objects and Redshift is used only for querying, processing and…

amazon-web-services amazon-redshift amazon-redshift-spectrum

asked May 07 '20 at 12:27

Durgaprasad

votes

3 answers

Is there a way to describe an external/spectrum table via redshift?

In AWS Athena you can write SHOW CREATE TABLE my_table_name; and see a SQL-like query that describes how to build the table's schema. It works for tables whose schema are defined in AWS Glue. This is very useful for creating tables in a regular…

amazon-redshift ddl amazon-redshift-spectrum

asked Dec 02 '19 at 21:44

New Alexandria

6,951
4
57
77

votes

0 answers

Serde serialization lib is null when the glue crawler crawls redshift table

I tried to create a glue crawler which crawls a redshift table.The glue crawler executes successfully and creates an external table.But when I look at the metadata of the table I found "Input format","Output format","Serde name" and "Serde…

amazon-web-services amazon-redshift aws-glue amazon-redshift-spectrum

asked May 15 '19 at 09:45

trp86

votes

2 answers

Trouble Partitioning my Amazon Spectrum Table

Getting this error in particular: ERROR: Error when calling external catalog API: The number of partition keys do not match the number of partition values

amazon-web-services amazon-redshift amazon-redshift-spectrum

asked Mar 09 '18 at 21:55

Aviv Goldgeier

votes

1 answer

What are the steps to use Redshift Spectrum.?

Currently I am using Amazon Redshift as well as Amazon S3 to store data. Now I want to use Spectrum to improve performance but confused in how to use it properly. If I am using SQL workbench can I create external schema from same or I need to create…

amazon-web-services amazon-s3 amazon-redshift amazon-redshift-spectrum

asked Jun 20 '17 at 07:39

Pratik Rawlekar

votes

1 answer

How Redshift Spectrum scans data?

Given a data-source of 1.4 TB of Parquet data on S3 partitioned by a timestamp field (so partitions are year - month - day) I am querying a specific day of data (2.6 GB of data) and retrieving all available fields in the Parquet files via Redshift…

amazon-web-services amazon-s3 amazon-redshift aws-glue-data-catalog amazon-redshift-spectrum

asked Nov 19 '20 at 12:57

Vzzarr

4,600
2
43
80

votes

0 answers

Quote escaped quotes in Redshift external tables

I'm trying to create an external table in Redshift from a csv that has quote escaped quotes in it, as documented in rfc4180: If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it…

amazon-redshift amazon-redshift-spectrum

asked Jan 30 '19 at 17:45

Tom Rea

2 3

…

19 20 Next