3

I am trying to extract data of specific stock symbol from the data of all stocks through for loop. When I use the code out of for loop the code is working while the same code is not working in for loop.

Below is the code -

Working -

df = fh_5[fh_5.symbol .== "GOOG", ["date","close"]]

Not working -

for s in unique!(fh_5.symbol)
    df = fh_5[fh_5.symbol .== s, ["date","close"]]
    date_range = leftjoin(date_range, df, on =:"dates" => :"date")
end

Error

ERROR: BoundsError: attempt to access 6852038×8 DataFrame at index [Bool[1, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0], ["date", "close"]]
Stacktrace:
 [1] getindex(df::DataFrame, row_inds::BitVector, col_inds::Vector{String})
   @ DataFrames ~\.julia\packages\DataFrames\3mEXm\src\dataframe\dataframe.jl:448
 [2] top-level scope
   @ .\REPL[349]:2

And after I run the for loop the code which was working outside the for loop it does not work, I have to re import the csv file - the the code outside the for loop works if I run it first. Am I changing the the base dataset fh_5 while I am running the for loop?

Just to add the reproducible example - Data for the example

Below is the code used -

using DataFrames
using DataFramesMeta
using CSV
using Dates
using Query


fh_5 = CSV.read("D:\\Julia_Dataframe\\JuliaCon2020-DataFrames-Tutorial\\fh_5yrs.csv", DataFrame)

min_date = minimum(fh_5[:, "date"])
max_date = maximum(fh_5[:, "date"])
date_seq = string.(collect(Dates.Date(min_date) : Dates.Day(1) : Dates.Date(max_date)))
date_range = df = DataFrame(dates = date_seq)
date_range.dates = Date.(date_range.dates, "yyyy-mm-dd")

for s in unique(fh_5.symbol)
    df = fh_5[fh_5.symbol .== s, ["date","close"]]
    date_range = leftjoin(date_range, df, on =:"dates" => :"date")
    rename!(date_range, Dict(:close => s))
end
  • Even if I put the specific symbol like "AAPL", "GOOG" and not use the dynamic variable in for loop just to test if there is any problem in the variable in for loop still I am getting the same error – Harneet.Lamba Apr 16 '21 at 23:33
  • Please, add a [mcve]. – Héliton Martins Apr 16 '21 at 23:38
  • I have added the reproducible example but the code is working fine now after incorporating the suggestion from @Cameron Bieganek - I changed "unique" from "unique!" earlier – Harneet.Lamba Apr 18 '21 at 02:13

1 Answers1

2

Don't use unique! for this, because that mutates the fh_5.symbol column. In other words, unique! removes the duplicate values from that column, which will change the length of that column. Use unique instead. So, something like this:

for s in unique(fh_5.symbol)
    df = fh_5[fh_5.symbol .== s, ["date","close"]]
    date_range = leftjoin(date_range, df, on =:"dates" => :"date")
end

In Julia, by convention, functions with names that end in ! will mutate (some of) their arguments.

Cameron Bieganek
  • 7,208
  • 1
  • 23
  • 40