2

I am attempting to scrape stock dividend data from web pages using F# and the FSharp.Data library. An example page can be seen at http://www.nasdaq.com/symbol/ibm/dividend-history.

To request the web page, my code is setup as a simple console app as an example and is as follows:

open FSharp.Data

[<EntryPoint>]
let main argv =
    let url = "http://www.nasdaq.com/symbol/ibm/dividend-history"
    let result = Http.RequestString(url)
    System.Console.ReadLine() |> ignore
    0 // return an integer exit code

When run, the RequestString method errors with:

"An unhandled exception of type 'System.ArgumentOutOfRangeException' occurred in FSharp.Core.dll

Additional information: Length cannot be less than zero."

It looks like the page is formatted in a way to that "traditional" scraping approaches won't work. Any ideas or thoughts would be appreciated.

Tom Atwood
  • 468
  • 3
  • 17
  • I tried this out myself and found that: at `FSharp.Data.HttpHelpers.getAllCookiesFromHeader@671.Invoke(Int32 i, String cookiePart) in ...FSharp.Data\src\Net\Http.fs:line 675` was where the actual crash was. It's attempting to call `String.Substring` so that you get a length up to the first "=" symbol. Sadly, that is defined using String.IndexOf, which gives -1 when there is no "=" symbol. Relevant source code: https://github.com/fsharp/FSharp.Data/blob/master/src/Net/Http.fs#L674 and comment: `.NET has trouble parsing some cookies. See http://stackoverflow.com/a/22098131/165633` – Ringil Mar 27 '16 at 17:52

1 Answers1

1

This is the full stacktrace I get when I run the code:

System.ArgumentOutOfRangeException: Length cannot be less than zero.
Parameter name: length
   at System.String.Substring(Int32 startIndex, Int32 length)
   at FSharp.Data.HttpHelpers.getAllCookiesFromHeader@671.Invoke(Int32 i, String cookiePart) in C:\Git\FSharp.Data\src\Net\Http.fs:line 675
   at Microsoft.FSharp.Collections.ArrayModule.IterateIndexed[T](FSharpFunc`2 action, T[] array)
   at FSharp.Data.HttpHelpers.getAllCookiesFromHeader(String header, Uri responseUri, CookieContainer cookieContainer) in C:\Git\FSharp.Data\src\Net\Http.fs:line 671
   at <StartupCode$FSharp-Data>.$Http.InnerRequest@803-5.Invoke(WebResponse _arg2) in C:\Git\FSharp.Data\src\Net\Http.fs:line 803
   at Microsoft.FSharp.Control.AsyncBuilderImpl.args@835-1.Invoke(a a)
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.FSharp.Control.AsyncBuilderImpl.commit[a](Result`1 res)
   at Microsoft.FSharp.Control.CancellationTokenOps.RunSynchronously[a](CancellationToken token, FSharpAsync`1 computation, FSharpOption`1 timeout)
>    at Microsoft.FSharp.Control.FSharpAsync.RunSynchronously[T](FSharpAsync`1 computation, FSharpOption`1 timeout, FSharpOption`1 cancellationToken)
   at <StartupCode$FSI_0004>.$FSI_0004.main@() in C:\Users\helgeu.COMPODEAL\AppData\Local\Temp\~vs2B9.fsx:line 8
Stopped due to error

I think you unfortunately have stumbled upon an bug related to this cookie handling code:

https://github.com/fsharp/FSharp.Data/issues/904

<rant>

I have tried to look into that code, but it gives me a headache from the evil cut and paste of some google answer on how to handle cookies in C# and then badly translated to F#.

</rant>

Think maybe adding info to that github case might be a better option than here.

Helge Rene Urholm
  • 1,190
  • 6
  • 16
  • This issue appears to have now been resolved in FSharp.Data with this PR. https://github.com/fsharp/FSharp.Data/pull/945 – Norman H Oct 10 '18 at 14:42