F# Business Intelligence Case Study - XBox Live Trueskill
Some time ago, I speculated that F# should be the lingua-franca of BI.
Well, had I done a little research, I would have found out that people are already doing BI using F#, namely Microsoft. On a recent podcast of The Thirsty Developer (to be published soon), some colleagues and I were talking to some members of our local evangelism team when Jason Bock told me that TrueSkill was implemented using F#. To which, I reply... that sure sounds like Business Intelligence - or something like it!
Since I am making yet another audacious claim, perhaps an explanation of what TrueSkill is would be helpful.
If you play XBox 360 and Halo, you are probably familiar with the concept of TrueSkill, which is used to measure how you compare against the bell curve distribution of other players of a given game.
As for the BI part, the analysis task of TrueSkill is to go over the following:
* Multiple terabytes of matchmaking log data
* Data spread over 11,000 text files.
* ETL'd into SQL Server
* Use of F# to do calculations on this massive database to determine TrueSkill rankings for players
We are probably not talking about simple averages and things you would do with a cube here. The resulting solution, as I was able to ascertain from a blog post by Don Syme, was about 100 lines of F# code to create a production version of this that runs as an ongoing task.
Now, if we replace customer records with XBOX Live log entries, I think we are onto something here - especially for more complex and/or scientific scenarios. Imagine, rather than TrueSkill, we used this to estimate, say "TrueLikelihoodToUpgradeFlights" so we could help airlines deploy better upsell opportunities.
But the bigger point - one pretty much demonstrated by this whole scenario, is that there is no reason - none. at. all - that BI has to be a database technology. BI can come from anywhere, and the best tool for doing the job in BI is functional languages, due to how functional languages makes the task of writing massivley paralellizable code a great deal simpler than other imperative languages do. Sometimes that language is SQL, and it's database oriented BI variants - and sometimes, as in this case, its F#.
It is high time we stop treating business intelligence and software development as if they are different things. They really aren't.
Update: from Microsoft has provided some more links for some of the background:
* Slides and presentations of an ICFP 2007 talk by Phillip Trelford on
the application of F# for TrueSkill
()
*
A presentation of the mathematics and tools used in the TrueSkill
analyses at KDD ()
* An F# implementation of TrueSkill at our blog