2

I am attempting to port an existing project of mine (a web scraper) from Python to F#, in order to learn F#. A component of the program saves compresses large strings (raw HTML) using LZMA, and stores it in SQLite in a makeshift key value table. The HTML string should always be unicode.

Because I am an F# beginner and this requires a lot of .NET interop, I am very confused as to how to accomplish this.

I would like to know how to do this properly in F#, and using LZMA instead of GZip.

Edit

I had difficulty finding an LZMA2 compatible .NET library, as LZMA-SDK uses LZMA1. This would not have been compatible with my existing data compressed using LZMA2. Therefore, along with help from comments I went ahead and implemented this using Gzip.

profesor_tortuga
  • 1,826
  • 2
  • 17
  • 27
  • 1
    I don't have time to write a complete answer right now, but some judicious use of [.Net's `MemoryStream` class](https://msdn.microsoft.com/en-us/library/system.io.memorystream(v=vs.110).aspx) might help you avoid the multiple temp files. As for how to use LZMA instead of GZip, for that you'll probably have to use a third-party library (there doesn't appear to be an LzmaStream in .Net). See http://stackoverflow.com/q/7646328/2314532 for some examples of usage -- but I don't have time to explain how to reference NuGet packages in F# right now, so a fuller answer will have to wait, sorry. – rmunn Mar 04 '17 at 08:28
  • Thanks for the direction... I'm making some progress forward, and updated the main post... – profesor_tortuga Mar 04 '17 at 21:40
  • https://github.com/weltkante/managed-lzma can do LZMA2 according to its Readme. – s952163 Mar 05 '17 at 09:37

1 Answers1

1

This uses Gzip for compression and is compatible with the gzip.compress/gzip.decompress functions in Python 3.5.

#if INTERACTIVE
#r "../packages/System.Data.SQLite.Core/lib/net46/System.Data.SQLite.dll"
#endif


open System.IO
open System.IO.Compression
open System.Data.SQLite

let compressString (s:string) =
  let bs = System.Text.Encoding.UTF8.GetBytes(s)
  use outStream = new MemoryStream()
  use gzOutStream = new GZipStream(outStream, CompressionMode.Compress, false)
  gzOutStream.Write(bs, 0, bs.Length)
  gzOutStream.Close()
  outStream.ToArray()

let decompressString (bs:byte[]) =
  use newInStream = new MemoryStream(bs)
  use gzOutStream = new GZipStream(newInStream, CompressionMode.Decompress, false)
  use sr = new StreamReader(gzOutStream)
  sr.ReadToEnd()

let insert dbc (key:string) (value:string) =
    let compressed = compressString value
    let cmd = new SQLiteCommand("INSERT into kvt (key, value) VALUES (@key, @value)", dbc)
    cmd.Parameters.Add(new SQLiteParameter("@key", key)) |> ignore
    cmd.Parameters.Add(new SQLiteParameter("@value", compressed)) |> ignore
    let res = cmd.ExecuteNonQuery()
    res

let fetch dbc (key:string) =
    let cmd = new SQLiteCommand("SELECT value FROM kvt WHERE key = @key", dbc)
    cmd.Parameters.Add(new SQLiteParameter("@key", key)) |> ignore
    let reader = cmd.ExecuteReader()
    reader.Read() |> ignore
    let compressed = unbox<byte[]> reader.["value"]
    decompressString compressed

let create() = 
    System.Data.SQLite.SQLiteConnection.CreateFile("mydb.sqlite")
    let dbc = new SQLiteConnection("Data Source=mydb.sqlite;Version=3;")
    dbc.Open()
    let cmd = new SQLiteCommand("CREATE TABLE kvt (key TEXT PRIMARY KEY, value BLOB)", dbc)
    let res = cmd.ExecuteNonQuery()
    dbc
Mark Bell
  • 28,985
  • 26
  • 118
  • 145
profesor_tortuga
  • 1,826
  • 2
  • 17
  • 27
  • you could just use [`use`](https://learn.microsoft.com/en-us/dotnet/articles/fsharp/language-reference/resource-management-the-use-keyword) with anything IDisposable and skip the `.Close()`. – s952163 Mar 05 '17 at 09:40
  • 1
    It should be noted that you need to call `gzOutStream.Close()` prior to the call to `outStream.ToArray()` inside of `compressString`, otherwise the output will not be fully written and any attempts to decompress it will result in an empty stream. – Chris Altig Nov 28 '19 at 04:03