36

Based on your actual experience, a whitepaper or other respected referenceable study, is F# currently a viable tool for corporate-/enterprise-level reporting?

Attention: Before voting to close this question as "not constructive", please read the bit at the bottom.

Background
I currently work at a large corporation which makes heavy use of many different reporting tools, including (but hardly limited to) SAS, Cognos, SSRS and even a good smattering of COBOL. Each tool has its rightful place and many of them are, in most respects, equivalent in feature set, etc. Most of our tools are able to output to PDF, Excel and databases relatively easily and in those cases work wonderfully.

Unfortunately, my organization, like many, makes use of Excel spreadsheets and, love it or hate it, we spend many hours writing .NET console applications to extract information from and insert information into Excel spreadsheets. (I'm not interested in arguing the merits or detriments of this approach. It is what it is and there's no way I can change it.)

As great as the reporting technologies listed above are, they fall flat when it comes to advanced ETL from or into spreadsheets. They just weren't designed for it and while they are perfectly adept at formatting a report as an Excel spreadsheet, they aren't very good at updating an existing spreadsheet or extracting data in some very specific way (extract only values highlighted in red, for example). So we end up writing a LOT of .NET console applications to do this bit. (Again - not interested in debating the approach. It is what it is. I know - I don't like it either.)

.NET is, in my opinion, a fantastic framework and flexible enough to handle almost any programming task, so we could theoretically handle all of the reporting in .NET. But - trying to handle all of the reporting in .NET takes too long. We have to write all the boilerplate stuff ourselves. I like to leverage the power, simplicity and robustness of the actual reporting tools we already have.

So, we end up writing two applications for a single task - for example, a SAS job to load the data from multiple data sources, do the transformations and store the result in a permanent or temporary location, and a second .NET job to take the results and load them into the spreadsheet. (I know.)

The Point
I've been seeing and hearing a great deal about F# in the past couple years and I've dabbled in it a bit myself. I learned OCAML in college and I love functional programming. When called for, I'd love to do all the programming for a particular report on a single platform (if not a single language). The question, though, is whether the F# language and the .NET framework are fully ready for enterprise-level reporting - and I'm talking reports that must be run accurately and efficiently. Microsoft is certainly selling it hard, but I want to know if anyone with experience in other reporting technologies has actually tried it in a production environment. How does it compare with other reporting technologies and can it be easily integrated into a corporate environment? How did you address security? Done right, what kind of memory-profile does F# require (we're talking millions of records)? Does it process tabular data well? Is it efficient? How easy is it to maintain (especially if the code grows)? What kind of third-party add-ons, plug-ins, etc. are required to get something working (or can it do most everything out of the box)? How much work (programming hours, etc) is required compared to other reporting systems (for similar results)?

If you have no experience with F#, or if you use F# exclusively, then I'm not particularly interested in your opinion - I'd like to hear from those who have actually bridged the gap and can relate, from experience, the opportunities and pitfalls in using F# as a reporting engine for big data (millions of records, outputted to a variety of formats).

I've seen a few questions that already cover some of this ground:

But they are a few years old. Several versions later, is F# up to the task? Or am I a dog barking up the wrong tree?

EDIT

Just for clarity, I am particularly interested in F#'s new information-rich programming. Prior to F# 3.0, it was merely an interesting technology, but F#'s recently added capabilities to use database type providers and its query expressions make it look like a viable alternative to other report authoring technologies. Microsoft is certainly suggesting it is.

An acceptable answer would contain a first-hand account (or a reference to a documented case study) of implementing an enterprise-level reporting engine built in F# and a comparison to another reporting technology of any performance gains or losses, etc. It doesn't have to be too detailed - just enough to convince an average (competent) manager that F# would be an appropriate/inappropriate technology for bulk/batch data processing. Has it been done? Who did it? What were the results? How complicated was the implementation (relative to similar technologies)? Does it perform well?


Why am I asking a subjective question?
Like most good stackoverflow members, I frequently vote to close subjective questions. According to the FAQ, subjective questions should be avoided but are not banned entirely. The FAQ links to six guidelines for great subjective questions which I have tried to follow. Please read those guidelines before voting to close this question.

Community
  • 1
  • 1
JDB
  • 25,172
  • 5
  • 72
  • 123
  • In terms of things like memory use and performance, you'll get a similar profile to C#. You should be able to find plenty of evidence that enterprise-grade systems are built with C#. Of course, the exact characteristics will also depend somewhat on how you structure your code (i.e. you may find that you use less memory when using structs even though records might otherwise be more idiomatic, etc.). – kvb Jan 31 '13 at 16:44
  • @kvb - Microsoft claims on their website that F# is faster than C# in some cases but doesn't elaborate (that I've found). There would appear to be performance differences, even though it all compiles down to the same CIL. *The resulting code runs at the speeds much faster than languages such as Python, JavaScript or R, and in some cases significantly faster than C#.* (but that's [Microsoft talking](http://www.tryfsharp.org/Explore) - I want unbiased answers) – JDB Jan 31 '13 at 16:54
  • Somehow this feels like the wrong question is being asked. If very targeted insertion/extraction is the issue can you use formulas to push/pull between the presentation data and a shadow copy that's in tabular format (or whatever works best with your current reporting tool)? – Daniel Jan 31 '13 at 19:13
  • @Daniel - Let's just say that the problem is bigger than I've described. I just offered that as a simple example of something SAS, Cognos, etc., cannot do. The right question is being asked. – JDB Jan 31 '13 at 19:16
  • 3
    ...Beyond that, I'm not sure what an acceptable answer would be. You can build whatever you want with .NET and all of your concerns (memory, scalability, etc) can be addressed with enough work. .NET vs some-reporting-tool is like comparing building supplies to a house. – Daniel Jan 31 '13 at 19:16
  • @Daniel - F# is nothing like C#. Saying that they are the same because they both run on the .NET platform is like saying that all Windows applications are the same because they run on the same OS. They are clearly geared for different tasks - the question is whether F# is mature enough to replace other technologies, such as SAS. I want to know if anyone has done it successfully and what issues they ran into. – JDB Jan 31 '13 at 19:24
  • Replacing SAS is a _monumental_ task. I've used C# and F# for a long time and the differences between them have very little relevance to your question. – Daniel Jan 31 '13 at 19:29
  • @Daniel - I'm not looking to replace SAS entirely. I am also a SAS programmer and I know how impossible that would be. But is F# a viable alternative given the right conditions? Can it handle large datasets efficiently? Have you ever used it to crunch millions of records and do complex joins? "I've tried it and it doesn't work" would be a fantastic answer. – JDB Jan 31 '13 at 19:33
  • 3
    You're comparing apples and oranges. F# is not a query language, or database, or analytical engine, etc. It is a general-purpose programming language. And, with enough work, you can build whatever you want with it. – Daniel Jan 31 '13 at 19:35
  • @Daniel - It seems to becoming more of a query language in 3.0 with its "information-rich programming". 2.0 wasn't interesting, but 3.0 is. – JDB Jan 31 '13 at 19:38
  • It is still nowhere near a high-performance query engine. You will have to build it. – Daniel Jan 31 '13 at 19:39
  • @Daniel - You say it is nowhere near a high-performance query engine. On what do you base that claim? Experience? Documentation? That's exactly what I'm looking for and what I'd consider to be an acceptable answer (as long as you can back it up). – JDB Jan 31 '13 at 22:11
  • 3
    **Gngn, closed without comment by 5 users [who](http://stackoverflow.com/search?q=user%3A95190+[f%23]) [have](http://stackoverflow.com/search?q=user%3A334849+[f%23]) [never](http://stackoverflow.com/search?q=user%3A680925+[f%23]) [been](http://stackoverflow.com/search?q=user%3A680925+[f%23]) [near](http://stackoverflow.com/search?q=user%3A589909+[f%23]) the F# tag!!!!!** – Benjol Feb 01 '13 at 06:30

6 Answers6

28

How does it compare with other reporting technologies and can it be easily integrated into a corporate environment?

I don't know how F# compares with other reporting technologies but I have deployed it in more than one corporate environment and it is basically the same as C#, i.e. easy and robust.

How did you address security?

Same as C#.

Done right, what kind of memory-profile does F# require (we're talking millions of records)?

I've found one GC bug in .NET in 5 years of use and it was not specific to F#. I've had several problems with large objects (again, not F# specific) but, in general, the GC is robust and efficient and collects aggressively.

I've processed billions of records and found F# to be extremely fast and very reliable. Note that F# is used in Microsoft's Bing AdCenter (for ad placement) and Microsoft's Halo 3, both of which require terabyte datasets to be processed.

Does it process tabular data well?

Yes and you have easy parallelism (see the Array.Parallel module) but its main strength relative to other tools is in manipulating structured data like trees and graphs.

Is it efficient?

Yes.

Our current client, one of the world's largest insurance companies, saw a 10x performance improvement switching from C++ to F# (as well as a 10x reduction in code size).

A previous client saw a performance improvement moving a compiler from OCaml to F#. This is impressive because OCaml was specifically designed for writing compilers and is extremely fast.

A former client had us rewrite their trading platform and we saw 100x throughput and latency improvements even though we were moving from non-GC C++ to GC'd F#.

How easy is it to maintain (especially if the code grows)?

Easy to maintain. In ML, adding functions is a no-brainer and the static type system catches gives you lots of feedback when you extend union types.

Our current client put their first F# code live last April and its maintainer had no problems despite not having had any training in F# (or OCaml) at all.

What kind of third-party add-ons, plug-ins, etc. are required to get something working (or can it do most everything out of the box)?

We have never used any (but we sell two!). The only third party things I've considered are WPF controls which are, again, not F# specific.

How much work (programming hours, etc) is required compared to other reporting systems (for similar results)?

No idea, sorry. Looks like we've got some work with Dialogue and HP Extreme coming up so I'll find out soon enough...

How complicated was the implementation (relative to similar technologies)?

F# code is much simpler than older mainstream languages like C++, C# and Java.

I'd like to stress that F# really pays dividends when you use it to attack problems that are too complicated to solve using more traditional tools, rather than just rewriting old code in F#.

For example, our current client have been using a business rules engine that cost them around £1,000,000 to buy but it doesn't solve their business problem (struggles with big tables, struggles with mathematics) so I wrote them a demo of a bespoke business rules engine in one week in around 1,000 lines of F# code. I could not have done that with any other tool.

J D
  • 48,105
  • 13
  • 171
  • 274
  • 5
    "...but its main strength relative to other tools is in manipulating structured data like trees and graphs." Over the last year I have read and followed most of your comments (and arguments) about F#. I have never found them to be incorrect. The quotation is no exception. My company was using VoltDB to compute certain matches. I re-wrote the entire module, using an in-memory trie-like structure in F#. Even for complex queries, my 800 lines of F# handily beat VoltDB for speed. No, my F# is not performing a logical subset of the VoltDB operations. Edit: my trie has about 3.2x10^6 nodes. – Shredderroy Feb 01 '13 at 01:27
26

To answer your question – you’re on the right track. I say this as someone who has built a number of reporting and big data systems. I built one of the Big Data Analytics platforms used at eBay in Scala and R. More recently I built the Hadoop / Hive F# Type Provider for MSRC. I can say that nothing comes close to the F# .net stack for this purpose. Great performance, easy to use native interop, lots of libraries, REPL, Type Providers, WPF for charting. Since MSRC I have been building a fully featured F# IDE that can be embedded into Excel where you can use a Type Provider to interact with the workbook complete with Intelisense. Email me if you’d like to see it.

Edit;

Sure; I replaced one of my customers Infobright database with F# using in-memory data and a from scratch query engine. It reduced query time on 10s of GBs of data from 30 minutes to 100s of milliseconds. The whole thing took me 6 hours to build and was only a few hundred lines of code. The database was the backend to a web-based reporting service which became immensely more responsive after the upgrade.

While at eBay I used to do my Big Data (bulk/batch) post processing in R. The basic flat files were 10s of GBs so they were way too big for Excel. R did a huge amount of unnecessary memory allocation during the aggregation passes; 10GB would become 40GB and would crawl to a halt once it started hitting the pagefile. Depending on the data it would take minutes, hours or never finish. There are paid R libraries that fix this but they are limiting in other ways. Doing the aggregations in F# brought this down to 100s of milliseconds with constant space. These aggregations were 10s of lines of code, about the same as R but much easier to understand and were type checked. Having an R job fail after an hour of processing because of a typo is infuriating.

I used to use OLAP cubes (e.g. Microsoft Analysis Services), but these systems have been entirely eclipsed by Big Data clusters and Big Memory machines. Now it is easy to build your own Big Memory machine with F# and the new Garbage Collector in .net 4.5.

Hope that helps.

moloneymb
  • 356
  • 2
  • 7
  • Welcome to SO! You might want to review the newly created [About page](http://stackoverflow.com/about) to get to know the site. Thanks for your answer - can you expand on your experiences writing reporting applications with F# and how that compared with other technologies you've used? Pros/Cons. (You can use the edit feature to add to your original answer.) – JDB Jan 31 '13 at 21:40
  • 7
    I'd love to see your embedded F# IDE! Would probably make an awesome blog post? – Robert Jeppesen Feb 01 '13 at 08:08
  • Would love to show it off publically but it is not fully baked. I’ll have a private beta starting in a few months. – moloneymb Feb 02 '13 at 22:17
  • @moloneymb where can I sign up for this private beta? – vlad Feb 04 '13 at 22:54
  • G'day Vlad, best send me an email, is it visible on my profile? – moloneymb Feb 05 '13 at 13:45
  • @moloneymb - No, email addresses are not visible in profiles. It is [recommended that you use chat](http://meta.stackexchange.com/q/80776/191410) to contact stack overflow members. If you have a service you wish to advertise, you can add information about it to your "About Me" in your profile (including an email address) - just realize it is visible to everyone on the internet. – JDB Feb 05 '13 at 14:19
  • Hi folks, I’ve added my contact details to the about me section. Feel free to email / skype me. Cheers - Matt – moloneymb Feb 06 '13 at 18:25
  • You can now see the editor at work in Excel at http://www.youtube.com/watch?v=XsNa2LbIdFA – moloneymb Mar 10 '13 at 13:40
  • @moloneymb I'm curious, how stable is the in-memory store that you built to replace InfoBright? For example, how does it handle deallocations? Also, what .NET structure (if any) did you use to store the data? – pim Dec 11 '18 at 12:33
5

I'm not sure how much this helps, but there are a few whitepapers about F# on Microsoft's website. The first one I linked below specifically mentions statistical processing / databases, so it may be the most useful of the three.

There's also an R type provider for F#, which makes for easy interoperation between F# and R.

Jack P.
  • 11,487
  • 1
  • 29
  • 34
  • The first two papers are not really on topic, but the third is useful. +1 for that, although I'm really looking for first-hand accounts (or at least a paper published by someone other than Microsoft or one of its partners). – JDB Jan 31 '13 at 18:52
3

If you're hoping to create an "enterprise-grade reporting system with better Excel automation," I think you're barking up the right tree (i.e. it's doable) but there's a bear (not a squirrel) in the tree. In other words, it would rarely be worthwhile. Now, maybe your situation is the exception. Extraordinary needs call for extraordinary measures. But, I wonder if there's some way to abstract the bits of this that can't be done by your reporting system and focus on improving interoperability...instead of building everything from scratch. The right approach, I think, will depend greatly upon the details, which you know best and, I presume, are too many to enumerate here.

Daniel
  • 47,404
  • 11
  • 101
  • 179
  • Excel automation is one example. Another I was just discussing with a colleague is interactivity. We can't put SAS on every business user's machine, and Cognos & SSRS allow only so much interactivity. F# has no such limit. Again, though, I know the theoretical advantages of F# - but has anyone actually made it work in their environment? *(Also, I know, I know, I know - Excel is a terrible reporting platform. But when you receive Excel "templates" from the state that must be populated with 5,000 data points, there's not much you can do.)* – JDB Jan 31 '13 at 19:30
1

I once tested F# to aggregate over a tab delimited text file containing 890,000 records (500mb) in about 20 seconds. It should be even faster on newer hardware with Win8 and .Net 4.5. I think its reasonably fast.

Not sure what your reporting requirements are but check out SQL Server Analysis Services (SSAS) and Reporting Services.

SSAS now comes with an in-memory 'tabular' engine. I recently tested that with 1 billion rows. Excel Pivot table queries aggregating over a billion rows happened in about 2 seconds.

fwaris
  • 11
  • 1
  • I'm mostly interested in F#. We already use a variety of reporting platforms and technologies (including SSRS and SSIS). – JDB Feb 01 '13 at 15:22
-1

Off topic, but you may want to automate your Excel workflow a bit using other tools like XLReport or its bigger cousin DBxtra, both can read from Excel files, make queries based on them, and export the results manually, or in the case of DBxtra automatically, the good side on both is that if the structure of the Excel files don't change, you need to design the queries just once.

Miguel Garcia
  • 806
  • 5
  • 9
  • I appreciate the info, but I'd like answers on this question to stay on topic (otherwise it will quickly get out of hand). We already automate our Excel workflow - that's what the .NET applications do. (You can easily delete the answer to recoup your rep or let it stand if you disagree and think it useful for this question.) – JDB Feb 01 '13 at 16:23