4

I'm having trouble finding an example to follow online for this simple use-case:

Load a CSV file from an s3 object location to julia DataFrame.

Here is what I tried that didn't work:

using AWSS3, DataFrames, CSV

filepath = S3Path("s3://muh-bucket/path/data.csv")

CSV.File(filepath) |> DataFrames            # fails

# but I am able to stat the file
stat(filepath)

#=
Status(  mode = -rw-rw-rw-,
  ...etc  
  size = 2141032 (2.0M),
  blksize = 4096 (4.0K),
  blocks = 523,
  mtime = 2021-09-01T23:55:26,
  ...etc
=#

I can also read the file to a string object locally:

data_as_string = String(AWSS3.read(filepath);
#"column_1\tcolumn_2\tcolumn_3\t...etc..."

My AWS config is in order, I can access the object from julia locally.

How to I get this into a dataframe?

Merlin
  • 1,780
  • 1
  • 18
  • 20

1 Answers1

4

Thanks to help from the nice people on julia slack channel (#data).

bytes = AWSS3.read(S3Path("s3://muh-bucket/path/data.csv"))

typeof(bytes)
# Vector{UInt8} (alias for Array{UInt8, 1})

df = CSV.read(bytes, DataFrame)

Bingo, I'm in business. The CSV.jl maintainer mentions that S3Path types used to work when passed to CSV.read, so perhaps this will be even simpler in the future.

Helpful SO post for getting AWS configs in order

Merlin
  • 1,780
  • 1
  • 18
  • 20