Getting Simple Things Done in F# Part 2 - Parsing a text file into a sequence of a custom type
In this post, we open a file that contains a long list of drugs from the FDA website and parse them into an immutable data structure.
The key is, of course, the idea of a F# sequence. Think of a sequence as something that you iterate over with potentially no end. That is, you would write a sequence for:
Finding the digits of pi
Finding primes
Reading from huge files
In this case, we have a huge file that we want to read a few lines from, and do so in a somewhat intuitive way. We create a function called contentsOf that takes in a filename as its parameter (though a URL or anything you can open a StreamReader with should work), and use it to grab, line by line, the contents of the given resource. We then use our drugParser method that takes in a line and chops out the pieces relevant for our Drug structure. At the end of this, we have some fun by creating some functions that print out the first few lines.
Is this revolutionary? No - this is basic file I/O. That said, we did write in a small number of lines of code a routine that generates a nice collection that can be used to do some bigger and better things - in a very compact way.
#light
#r @"C:\Program Files\Reference Assemblies\Microsoft\Framework\v3.5\System.Core.dll"
open System
open System.IO
open System.Linq
//represents a details of a pharmaceutical, as detailed on the FDA website (http://www.fda.gov/cder/ndc/)
type Drug =
{
listingSequenceNumber: int;
labelerCode: string;
productCode: string;
strength: string;
unitStrength: string;
overTheCounter: bool;
tradeName: string
}
[<STAThread>]
do
//a drug parser takes a sequence of strings that probably come from a formatted text file and reads them in
let drugParser (druglines : seq<string>) =
{
for line in druglines do
yield { listingSequenceNumber = line.Substring(0,7) |> Convert.ToInt32;
labelerCode = line.Substring(8,6);
productCode = line.Substring(15,4);
strength = line.Substring(20,10);
unitStrength = line.Substring(31,10);
overTheCounter = match line.Substring(43) with "R" -> false | "O" -> true | _ -> false;
tradeName = line.Substring(44,100).Trim();
}
}
//function that reads the contents of a file, line by line, into a sequence
let contentsOf file = seq { for line in file |> File.ReadAllLines do yield line }
//a sequence of all current drugs from a specific file
//listings.TXT is available from http://www.fda.gov/cder/ndc/
let allCurrentDrugs = contentsOf "listings.TXT" |> drugParser
//the first N drugs from the file
let take a b = Enumerable.Take(a,b)
let firstNDrugs n = n |> take allCurrentDrugs
//prints the first N drugs from the file
let printFirstNDrugs n =
for drug in n |> firstNDrugs do
printf "Drug Name = %s\n" drug.tradeName
10 |> printFirstNDrugs
printf "Done\n