6

I have a csv file which looks like below,

20×2 DataFrame
│ Row │ Id    │ Date       │
│     │ Int64 │ String     │
├─────┼───────┼────────────┤
│ 1   │ 1     │ 01-01-2010 │
│ 2   │ 2     │ 02-01-2010 │
│ 3   │ 3     │ 03-01-2010 │
│ 4   │ 4     │ 04-01-2010 │
│ 5   │ 5     │ 05-01-2010 │
│ 6   │ 6     │ 06-01-2010 │
│ 7   │ 7     │ 07-01-2010 │
│ 8   │ 8     │ 08-01-2010 │
│ 9   │ 9     │ 09-01-2010 │
│ 10  │ 10    │ 10-01-2010 │
│ 11  │ 11    │ 11-01-2010 │
│ 12  │ 12    │ 12-01-2010 │
│ 13  │ 13    │ 13-01-2010 │
│ 14  │ 14    │ 14-01-2010 │
│ 15  │ 15    │ 15-01-2010 │
│ 16  │ 16    │ 16-01-2010 │
│ 17  │ 17    │ 17-01-2010 │
│ 18  │ 18    │ 18-01-2010 │
│ 19  │ 19    │ 19-01-2010 │
│ 20  │ 20    │ 20-01-2010 │

after reading the csv file date columns is in String type. How to externally convert a string series into Datetime series. In Julia Data Frame docs doesn't talk Anything about TimeSeries. How to externally convert a series or vector into Datetime format? Is there anyway I can mention timeseries columns while reading a CSV File?

Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111

2 Answers2

10

When reading-in a CSV file you can specify dateformat kwarg in CSV.jl:

CSV.File("your_file_name.csv", dateformat="dd-mm-yyyy") |> DataFrame

On the other hand if your data frame is called df then to convert String to Date in your case use:

using Dates
df.Date = Date.(df.Date, "dd-mm-yyyy")
Bogumił Kamiński
  • 66,844
  • 3
  • 80
  • 107
  • Is there any specific documentation for TimeSeries Analysis? it would be great if I get the link. since my most of the data are timeseries, I want to have a detail look on documentation. – Mohamed Thasin ah Jul 06 '20 at 08:00
  • 1
    There is a `Dates` module in Base: https://docs.julialang.org/en/latest/stdlib/Dates/. Also there is a https://juliastats.org/TimeSeries.jl/latest/ package (note though that it is a specific container type, so if you use it probably you do not need DataFrames.jl then). The integration between `TimeArray` and `DataFrame` is described here https://juliastats.org/TimeSeries.jl/latest/tables/#Tables.jl-Interface-Integration-1. – Bogumił Kamiński Jul 06 '20 at 08:07
  • I have one more question, why every first time when I import package or perform some operations takes log time to execute? Is it natural or something I have to do on my machine. I guess it' compiling or doing something else. unlike Pythion, Julia is not a scripting language it performs compilation. could you explain the behaviour? If you want I can create this as a separate question also. – Mohamed Thasin ah Jul 06 '20 at 08:15
  • I think it is worth a separate question if you want to know the details. In short - you are right - the first time some function is called with a specific signature then it gets compiled. It happens only once per signature. It is a thing that is actively worked on by core developers to reduce this latency. Already now there is an option to cut it down using https://github.com/JuliaLang/PackageCompiler.jl but it is not a full solution for every use case. Note e.g. that standard library like `Statistics` or `Dates` does not have this latency because it uses such approach. – Bogumił Kamiński Jul 06 '20 at 08:25
  • I will create this as a separate question, because I would like to how Julia works in detail. I will come up with some detailed use case and we will discuss again on a new thread. – Mohamed Thasin ah Jul 06 '20 at 08:30
1

Here is how I have done it:

First a helper function to convert different string formats.

parse_date(d::AbstractString) = DateTime(d, dateformat"yyyy-mm-dd HH:MM:SS")
parse_date(v::Vector{AbstractString}) = parse_date.(v)
parse_date(v::Vector{String}) = parse_date.(v)
parse_date(v::Vector{String31}) = parse_date(String.(v))
using Pipe, TimeSeries

prices = @pipe CSV.File(filename; header = 1, delim = ",") |>
         TimeArray(_; timestamp = :Date, timeparser = parse_date)