In this blog post, I show little code review of a code sent to me by friend. The code is quite simple example on how to read XML file using XmlProvider. My friend’s biggest concern was that his code is still procedural and not ‘functional’. I was asked if this code can be refactored to be more functional, whatever it means.

First of all, the code that I got, is fine and most importantly it works. Even when you are using F# to solve your problems, you don’t really have to make your code super functional. That’s the beauty of this language. You can still use state-full and mutable approach to implement solution. You don’t have to make this code functional just for the sake of it.

Said that, I will still try to make the code better and in some ways more ‘functional’ to show some awesome F# features and approaches that do make the difference. My main goal here is to to remove loops and mutability.

XML structure

<catalog name="TERC" type="all" date="2015-01-01">
 <row>
   <col name="WOJ">02</col>
   <col name="POW"/>
   <col name="GMI"/>
   <col name="RODZ"/>
   <col name="NAZWA">DOLNOsLaSKIE</col>
   <col name="NAZDOD">województwo</col>
   <col name="STAN_NA">2015-01-01<col>
 </row>
...

The XML is a representing a Collection of ‘rows’ with parameters specified as ‘cols’. Each ‘Col’ has a param name and value. Some of the params are missing value.

Original Code

open FSharp.Data
open System

type Record = {
               voivoship : Option<int>;
               district : Option<int>;
               community:Option<int>;
               rozd:Option<int>;
               name: Option<string>;
               addinf: Option<string> 
              }

type InputXml = XmlProvider<"Terc.xml">
let tercs = InputXml.GetSample()

let records =
    seq {

        for row in tercs.Catalog.Rows do          

            let  vs = ref None
            let  dt = ref None
            let  cy = ref None
            let  rd = ref None
            let  ne = ref None
            let  af = ref None          

            for col in row.Cols do
                match col.Name with
                | "WOJ" -> vs := col.Number
                | "POW" -> dt := col.Number
                | "GMI" -> cy :=  col.Number
                | "RODZ" -> rd :=  col.Number
                | "NAZWA" -> ne := col.String
                | "NAZDOD" -> af :=  col.String
                | _ -> ()

            yield {
                    Record.voivoship = !vs;
                    Record.district = !dt;
                    Record.community = !cy;
                    Record.rozd = !rd;
                    Record.name = !ne;
                    Record.addinf = !af
                  }
         }
  • Record type defined with all the values as options. Column values are optional thus all the properties are declared as options. This allows the usage of 'None' value.
  • The data is being read using FSharp.Data and XmlProvider
  • Collection of records is created by using Sequence and yield operator.
  • The core mapping code from XML to record is in the loop.
  • By default all the values, that hold col value, are set to None and then if one of them exists we do set the proper value
  • I am not sure why 'ref' has been used here. It can be changed to just mutable 'let binding'. More on ref
  • Because 'ref' is used, there has to be '!' used to get the data. Thus this weird syntax of '!cy' to extract the value from 'ref'.

So where to start with some changes ?

With operator and stateless record change

I start with a simple, yet a nice change, that makes the code cleaner.

Instead of extracting all the values and then creating a record instance at the end, I use ‘with’ operator and compose the Record on the fly.

In order to do this, I need to define a function to create a default record type with all the values set to ‘None’ ( option type ). Thanks to this, I can remove ref’ and weird initialization of temp values.

let createRecord =
        {
            voivoship = None
            district = None
            community = None
            rozd = None
            name = None
            addinf = None
        }

Then, I introduce a function that maps the ‘col’ and its value to proper field in the Record

let addCol name number string record = string -> int -> string -> Record -> Record

This function takes a name of the ‘col’ ( parameter ) , a number or a string and the existing record. The with operator creates a new Record based on existing one with specified field added to it.

let addCol name number string record =
        match name with
        | "WOJ" -> { record with voivoship = number }
        | "POW" -> { record with district = number }
        | "GMI" -> { record with community = number }
        | "RODZ" -> { record with rozd = number }
        | "NAZWA" -> { record with name = string }
        | "NAZDOD" ->  { record with addinf = string }
        | _ -> record

Now, I can write code like this.

let newRecord = createRecord
let recordWithNewValue = addCol "WOJ" 1 None newRecord

This code creates newRecord and then adds voivoship value to it. All the existing fields are still set to ‘None’.

From loop to recursion

In the original code there is a loop used to change the state of the object. This is a perfect approach in procedural statefull approach. You just add new fields to the object. However when, stateless approach is needed, loop won’t fit it due to its dependence on external state change. Loop doesn’t return anything and it can only change existing state out of its scope. If you want to change something in the loop it has to be mutable.

If you want to loop some collection and change something without mutable value then you might use recursion. The good thing about it is that recursion is a function that has to return something and you can return new state. There is no need for mutable values.

To change the loop from original code to recursion, I need to define a recursive function.

let rec iterateCols record (cols:List<InputXml.Col>) =
	match cols with
		| head::tail -> iterateCols (addCol head.Name head.Number head.String record) tail
		| [] -> record
  • rec means that this function is recursive
  • this function takes an existing record and list of columns
  • it loops through the list of columns using match
  • 'head::tail' is a list deconstruction convention using cons operator - 'head' will be the first element while 'tail' is the rest of the collection
  • to do recursive call, I invoke the same function passing record created by addCol function ( introduced earlier in the post ) and rest of the collection ( 'tail' )
  • if the list is empty i return the record
  • this is a very common pattern in functional programming

To bind everything together, there is a function that initiates the recursion

let createRecordBasedOnRow cols =
	let rec iterateCols record (cols:List<InputXml.Col>) =
		match cols with
			| head::tail -> iterateCols (addCol head.Name head.Number head.String record) tail
			| [] -> record
	iterateCols createRecord cols
  • I do declare a function iterateCols/li>
  • and then invoke it with initial parameters ( an empty record ) and ( list of columns )
  • </ul> Sequence now returns a collection of Rows with values from Columns
    let records =
    	seq {
    		for row in tercs.Rows do      
    			yield (createRecordBasedOnRow (row.Cols |> Array.toList))
    		}
    That's all. Full code is available here.

    Recursion vs Loop

    When you look at the original code and new code, it might look less readable. I do find recursion more readable now, but I got used to it. Normal for loop might be more readable in some cases and you have to always ask a questions like.
    • Do, I need stateless code in here ?
    •  What are the benefits of making this code more 'pure-functional'
    Like everything in our field, there is no silver bullet and we have to be pragmatic. Feel free to leave a comments with your opinions about this code. I would love to hear more suggestion and ideas on how to approach problems like this.