1

I was following along with the tutorials for connecting Tableau to Amazon Athena and got hung up when running the query and returning the expected result. I downloaded the student-db.csv from https://github.com/aws-samples/amazon-athena-tableau-integration and uploaded the csv to a S3 bucket that I created. I can create the database within Athena however when I create a table either with the bulk add or directly from the query editor and preview with a query the data gets corrupted. and includes unexpected characters and unexpected/unnecessary punctuations and sometimes all the data is aggregated into a single column and also contains metadata such as "1 ?20220830_185102_00048_tnqre"0 2 ?hive" 3 Query Plan* 4 Query Plan2?varchar8 @H?P?". Also with my Athena - Tableau connected receiving the same issues when I preview the table that was created with Athena and stored in my bucket.

CREATE EXTERNAL TABLE IF NOT EXISTS student(
  `school` string, 
  `country` string, 
  `gender` string, 
  `age` string, 
  `studytime` int, 
  `failures` int, 
  `preschool` string, 
  `higher` string, 
  `remotestudy` string, 
  `health` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://jj2-test-bucket/'
TBLPROPERTIES (
  'has_encrypted_data'='false', 
  'skip.header.line.count'='1', 
  'transient_lastDdlTime'='1595149168')
SELECT * FROM "studentdb"."student" limit 10;

Query preview

JacobJ215
  • 11
  • 1
  • Are you seeing this issue in Athena or Tableau. I am able to query the data fine using the DDL in Athena. – Prabhakar Reddy Sep 01 '22 at 03:29
  • I am seeing this issue on both Athena and Tableau, I also ran this on my personal computer from a different AWS account and it produced the same result. – JacobJ215 Sep 01 '22 at 03:59
  • I see that someone else had a similar issue with the metadata in Athena (https://stackoverflow.com/questions/51100316/aws-athena-csv-metadata-delimiter-changed-after-first-query-use?rq=1) their solution was to create a separate bucket to store the .csv from the query-result. I will give this a try a little later. – JacobJ215 Sep 01 '22 at 04:05

1 Answers1

0

The solution is to create a separate S3 bucket to house the query results. Additionally, when connecting to Tableau you must set the S3 Staging Directory to the location of the Query Result bucket rather than connecting to the S3 bucket that contains your raw data/csv

JacobJ215
  • 11
  • 1