2

geniouses. I am a newbie in Julia, but have an ambitious. I am trying to a following stream so far, of course it's an automatic process.

  1. read data from csv file to DataFrames
  2. Checking the data then cerate DB tables due to DataFrames data type
  3. Insert data from DataFrames to the created table ( eg. SQLite )

I am sticking at No.2 now, because, for example the column's data type 'Vector{String15}'. I am struggling how can I reflect the datatype to the query of creating table. I mean I could not find any solutions below (a) (b).

  fname = string( @__DIR__,"/","testdata/test.csv")
  df = CSV.read( fname, DataFrame )
  last = ncol(df)
  for i = 1:last
      col[i] = typeof(df[!,i])  # ex. Vector{String15}
      if String == col[i]       # (a) does not work
         # create table sql
         # expect
         query = 'create table testtable( col1 varchar(15),....'
      elseif Int == col[i]      # (b) does not work
         # create table sql
         # expect
         query = 'create table testtable( col1 int,....'
      end
         ・
     ・
  end

I am wonderring,

  1. I really have to get the type of table column from 'Vector{String15}' anyhow?
  2. Does DataFrames has an utility method to do it?
  3. Should combine with other module to do it?

I am expecting smart tips by you, thanks any advances.

onoke
  • 79
  • 6

1 Answers1

2

Here is how you can do it both ways:

julia> using DataFrames

julia> using CSV

julia> df = CSV.read("test.csv", DataFrame)
3×3 DataFrame
 Row │ a             b      c
     │ String15     Int64  Float64
─────┼─────────────────────────────
   1 │ a1234567890      1      1.5
   2 │ b1234567890     11     11.5
   3 │ b1234567890    111    111.5

julia> using SQLite

julia> db = SQLite.DB("test.db")
SQLite.DB("test.db")

julia> SQLite.load!(df, db, "df")
"df"

julia> SQLite.columns(db, "df")
(cid = [0, 1, 2], name = ["a", " b", " c"], type = ["TEXT", "INT", "REAL"], notnull = [1, 1, 1], dflt_value = [missing, missing, missing], pk = [0, 0, 0])

julia> query = DBInterface.execute(db, "SELECT * FROM df")
SQLite.Query(SQLite.Stmt(SQLite.DB("test.db"), 4), Base.RefValue{Int32}(100), [:a, Symbol(" b"), Symbol(" c")], Type[Union{Missing, String}, Union{Missing, Int64}, UnionMissing, Float64}], Dict(:a => 1, Symbol(" c") => 3, Symbol(" b") => 2), Base.RefValue{Int64}(0))

julia> DataFrame(query)
3×3 DataFrame
 Row │ a             b      c
     │ String       Int64  Float64
─────┼─────────────────────────────
   1 │ a1234567890      1      1.5
   2 │ b1234567890     11     11.5
   3 │ b1234567890    111    111.5

If you would need more explanations this is covered in chapter 8 of Julia for Data Analysis. This chapter should be available on MEAP in 1-2 weeks (and the source code is already available at https://github.com/bkamins/JuliaForDataAnalysis)

Bogumił Kamiński
  • 66,844
  • 3
  • 80
  • 107
  • Big thanks Dr. Kaminski, I did not have an image that the solution would be pointed by the Super Author. I could followd your solution, but 'puzzles' in the sql, it should be "df", should not it? :-) I have noticed it was perfectly different approach with my idea. Thanks a lot again, I will buy your MEAP book. – onoke Jun 22 '22 at 02:01
  • Yes `"puzzles"` should be `"df"` - fixed. AFAICT you can read chapters in MEAP without having to buy the book. – Bogumił Kamiński Jun 22 '22 at 06:13
  • 1
    >you can read chapters in MEAP without having to buy the book I think it is a bad manner for a big worker. ;) – onoke Jun 23 '22 at 04:00