8

I have this data stored in S3 as .csv (but it can be any other file format which is the best suitable for my requirement):

"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",

and I would like to have one coordinate per column:

Like this:

Coordinates:

1.  41.9100687,-87.8805614

2.  41.9802511,-87.8803253

3.  41.9806802,-87.8792417

After importing S3 I choose CSV as data type... And then I add string column.

enter image description here

But instead I get some weird table output. Beside this I tried to import this as plain txt file with comma delimiter.. I get same weird output.

enter image description here

What am I doing wrong here?

EDIT

This test column screenshot is query from another but identical example. There should be gps_coordinates

harunB10
  • 4,823
  • 15
  • 63
  • 107

1 Answers1

14

To reproduce your situation, I did the following:

  • Created a text file using your sample data (gps.txt)
  • Uploaded it to an Amazon S3 bucket in its own folder (with no other files in that folder)
  • Created a table in Amazon Athena
    • Specified the location as the folder name (s3://my-bucket/gps/)
    • Specified 7 columns (since there are 7 string values in your sample file)

However, since the data has commas within each pair of numbers, I changed the SerDe to OpenCSVSerDe for Processing CSV - Amazon Athena:

CREATE EXTERNAL TABLE IF NOT EXISTS default.gps (
  `c1` string,
  `c2` string,
  `c3` string,
  `c4` string,
  `c5` string,
  `c6` string,
  `c7` string 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ("separatorChar" = ",", "escapeChar" = "\\") 

LOCATION 's3://my-bucket/gps/'
TBLPROPERTIES ('has_encrypted_data'='false');

I was then able to successfully query the table. A sample column value is: 41.9100687,-87.8805614

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470