Solving Problems With Pure Functions, Types, And EOP

Introduction

Writing Scala code like this gets to be so much like algebra that you can imagine walking in on Albert Einstein writing at a chalkboard, and following his equations one at a time until he joins them all together at the end.

To demonstrate this, let’s walk through the process of writing a “word occurrence” algorithm using only pure functions, immutable variables, immutable data structures, and EOP. I think you’ll see that we can simply talk about this as we sketch out a series of pure function signatures — just like writing algebra on a chalkboard.

This example was inspired by a Haskell book. I’d like to give it credit, but I haven’t been able to remember which book I saw it in.

Source code

If you want to work with this example’s source code, you can find it here in the book’s Github repository, under the WordCount folder:

Problem statement

The first thing to do in any problem-solving effort is to write a clear problem statement:

Given a document that’s represented as a single Scala String, count the number of occurrences of each word in that document.

In addition to that statement, we’re also shown that the output of the algorithm should look like this, with the words in the first column and the number of occurrences in the second column:

the    543
an     210
or     99
(more words here ...)

Okay, that seems good enough for now. So, how do we start?

Getting started

The first thing we can say is that we’re going to write a pure function that works like a “word occurrences” or “word count” function. I don’t want to keep writing the word “occurrences,” so to keep this simple I’ll refer to it as a word count function, which I’ll name wc:

def wc

Next, I know from the problem statement that it will take the document as a String input parameter:

def wc(document: String)

Now if I look at the sample output, that sure looks like a Map to me. More specifically, a map where the key is a string, and the value is an integer:

Map[String, Int]

For instance, in Scala, that sample data looks like this Map:

Map(
    "the" -> 543,
    "an" -> 210,
    "or" -> 99
)

Also note that the output is printed in sorted order, so we’ll want to return a VectorMap to indicate to others that there’s something unique about this map:

VectorMap[String, Int]

A VectorMap is a specific type of immutable Map that lets us return the map values in a sorted order.

Knowing this, we can finish the sketch of our main function signature like this:

def wc(document: String): VectorMap[String, Int]

So there we have it, that’s the signature for our main function. Now all we have to do is sketch out some more pure functions that this function will call.

The next steps

There are several ways we could proceed next, but because the requirement is that we just turn a String into a Map, there’s no need to deal with that String as paragraphs and sentences. Instead, we can work with it as is, adjusting it slightly for our needs.

Get rid of the newline characters

The first thing I’ll do is strip all the newline characters from the String. What I’m thinking is that I can replace all those newline characters with blank spaces, and then in a few moments I’ll reduce all occurrences of multiple blank spaces down to single blank spaces.

When you think of how this “get rid of all newline characters” function needs to work, you can imagine that it takes the existing String as its input parameter, so I start with this:

def replaceNewlinesWithBlanks(s: String)

Next, I know that I want to return a String with all of the newline characters replaced with blanks, so I can add String as the function’s return type:

def replaceNewlinesWithBlanks(s: String): String

So that’s a sketch of that function signature. At this point I feel comfortable that this is the first step of my wc function, so I add it to wc’s body:

def wc(document: String): VectorMap[String, Int] = 
    val stringWithoutNewlines =
        replaceNewlinesWithBlanks(document)

    // more code to come ...

Note that I’ll be wrapping my expressions inside wc as shown because (a) I’m intentionally using long, meaningful function names, and (b) it’s hard to print wide lines in a book.

Remove formatting characters

The next thing I need to do is to remove all formatting characters from the string — things like periods, exclamation points, question marks, semi-colons, colons, and em dashes. So again, this is going to be a function that takes a String as input and returns a String, so I sketch this:

def stripFormattingCharacters(s: String): String

I really don’t want you thinking about algorithms at the moment — just function signatures. But if you find yourself thinking about algorithms, this function can either delete all of those formatting characters or replace them with blank spaces; either approach will work.

When I add this function to wc’s body, it now looks like this:

def wc(document: String): VectorMap[String, Int] = 
    val stringWithoutNewlines =
        replaceNewlinesWithBlanks(document)
    val stringWithoutFormatting = 
        stripFormattingCharacters(stringWithoutNewlines)

Eliminate multiple blank spaces

At this point I can’t think of any other changes I want to make to the initial string. You can imagine that we started with a multiline string that looks like this:

Foo bar’s baz?

Buzzle; buzzy. Burf!

and at this point that string should look like this:

Foo bar’s baz  Buzzle buzzy Burf

We’re still working with a String, and all of its words should be separated by one or more blank spaces, so this is a good time to create a function to convert multiple blank spaces into a single blank space. Once again we have a String coming in and a String going out, so I sketch this function like this:

def squeezeBlankSpaces(s: String): String

Then I add that function call to wc to get this:

def wc(document: String): VectorMap[String, Int] = 
    val stringWithoutNewlines =
        replaceNewlinesWithBlanks(document)
    val stringWithoutFormatting = 
        stripFormattingCharacters(stringWithoutNewlines)
    val stringWithSingleBlanks =
        squeezeBlankSpaces(stringWithoutFormatting)
    ...

From String to Seq[String]

Now our String should be free of punctuation characters, with each word separated by a single blank space. So the next step is to transform it into a list of words, which in Scala can be represented as a Seq[String], List[String], or Vector[String]. Once we have this list of words, we can later convert it to a Map of word counts.

Seq, List, and Vector are all immutable lists (sequences) that are suited for slightly different purposes, but as long as we’re working with a relatively small document, any of them will work.

Therefore, this function takes a String as input and returns a Seq[String], so its signature looks like this:

def convertStringToListOfWords(s: String): Seq[String]

and when I add it to wc, its body now looks like this:

def wc(document: String): VectorMap[String, Int] = 
    val stringWithoutNewlines =
        replaceNewlinesWithBlanks(document)
    val stringWithoutFormatting = 
        stripFormattingCharacters(stringWithoutNewlines)
    val stringWithSingleBlanks =
        squeezeBlankSpaces(stringWithoutFormatting)
    val listOfWords = 
        convertStringToListOfWords(stringWithSingleBlanks)
    ...

Convert words to the same case

Now that we have a list of words, we need to do one more thing: Either (a) convert them all to the same case — lowercase or uppercase — or (b) perform a case-insensitive comparison of them later on when we count the words. We need to do this to make sure we count words like “Do” and “do” as the same word.

Being lazy, I’ll change all the words to lowercase now. And because we’re now working with a Seq[String] — our list of words — I’ll sketch this function to take a Seq[String] as input, and also return a Seq[String]:

def lowerCaseAllWords(listOfWords: Seq[String]): Seq[String]

With this addition, our wc function looks like this:

def wc(document: String): VectorMap[String, Int] = 
    val stringWithoutNewlines =
        replaceNewlinesWithBlanks(document)
    val stringWithoutFormatting = 
        stripFormattingCharacters(stringWithoutNewlines)
    val stringWithSingleBlanks =
        squeezeBlankSpaces(stringWithoutFormatting)
    val listOfWords = 
        convertStringToListOfWords(stringWithSingleBlanks)
    val lcListOfWords = 
        lowerCaseAllWords(listOfWords)
    ...

From a Seq[String] to a Map

Now that we have a list of words, we’re almost home. The next step is to create a function that converts the list of words into a Map, where that Map contains each unique word as the key, and the count of occurrences of that word as its value.

We don’t have to think about that algorithm yet, we just need to sketch the function signature. We know that it should take a Seq[String] as input, and return a Map[String, Int], so its signature looks like this:

def convertWordListToWordCount(listOfWords: Seq[String]): Map[String, Int]

and now wc looks like this:

def wc(document: String): VectorMap[String, Int] = 
    val stringWithoutNewlines = 
        replaceNewlinesWithBlanks(document)
    val stringWithoutFormatting = 
        stripFormattingCharacters(stringWithoutNewlines)
    val stringWithSingleBlanks =
        squeezeBlankSpaces(stringWithoutFormatting)
    val listOfWords = 
        convertStringToListOfWords(stringWithSingleBlanks)
    val lcListOfWords = 
        lowerCaseAllWords(listOfWords)
    val initialMap = 
        convertWordListToWordCount(lcListOfWords)
    ...

Sort the Map

As our last step, now we just need to sort that Map by its values, in descending order. For example, the Map may look like this right now:

Map(
    "or" -> 99,
    "the" -> 543,
    "an" -> 210
)

and we want it to look like this:

Map(
    "the" -> 543,
    "an" -> 210,
    "or" -> 99
)

So we need a function that takes a Map[String, Int] as input, and also returns a Map[String, Int], BUT, the returned map needs to be sorted. Scala’s VectorMap can be used for just this purpose, so I sketch this function:

def sortMapByHighestValue(map: Map[String, Int]): VectorMap[String, Int]

and then I add it to wc:

def wc(document: String): VectorMap[String, Int] = 
    val stringWithoutNewlines = 
        replaceNewlinesWithBlanks(document)
    val stringWithoutFormatting = 
        stripFormattingCharacters(stringWithoutNewlines)
    val stringWithSingleBlanks =
        squeezeBlankSpaces(stringWithoutFormatting)
    val listOfWords = 
        convertStringToListOfWords(stringWithSingleBlanks)
    val lcListOfWords = 
        lowerCaseAllWords(listOfWords)
    val initialMap = 
        convertWordListToWordCount(lcListOfWords)
    val sortedMap = 
        sortMapByHighestValue(initialMap)
    sortedMap

Note that in this case I add the sortedMap variable to the end of the function, because that’s the value that our function yields.

NOTE: In functional programming, FPers don’t say “return”; we say that a function yields a result, or that it evaluates to a result. Again, we’re writing algebra here, and that’s what algebraic equations do.

Is that code hard to read?

Great questions at this point are:

  • Is that code hard to read?
  • Is it hard to reason about?

In both cases I suspect that you’ll say that this code is actually about as simple as it gets. Because the functions are pure and the variables and data types are immutable, the code ends up being written in an EOP style, which I find to be as simple as possible.

I don’t like to use the word “simple” in programming books, because everyone’s background is different. So that’s why I say, “as simple as possible.” Frankly, the code reminds me of writing BASIC code with pure functions when I was in high school.

Discussion

So this is pretty cool. Assuming that I thought of everything, we almost completely solved this problem by (a) sketching out a series of pure function signatures, and then (b) combining those signatures like a series of algebraic equations. All that’s left to do is write the code inside those function bodies.

One thing that’s worth mentioning is that as you go through this process, it can help to jot down a sample of the input you expect a function to receive, as well as the output it’s expected to return. That’s why I showed the String and Map data in a few steps there, to make sure that my logic was sound.

Two notes

Two notes come to mind after writing this section of the book.

1) Being a mathematician or architect

The first note is to reiterare what I’ve already stated: that FPers like to think of this style of programming as math, or perhaps as creating a blueprint. So rather than being a programmer, you’re like a mathematician or architect, working on equations or the blueprint. Because of this, you’ll also hear FPers say things like “reasoning about the code,” because they look at function signatures like these and how they glue together.

As you’ll see throughout this book, when your functions are pure and your values are immutable, you can do this.

2) You can do much more with Scala types

A second thing I need to mention is that with the Scala type system, you can do some things to make your code even more readable. For instance, in my code I return a VectorMap from a function, but if you think about that signature, it could be more meaningful, right?

What I mean is that I wrote this:

VectorMap[String, Int]

but with Scala’s type system I can write that return type like this instead:

MapSortedByValueDesc[String, Int]

Scala lets you create more meaningful type names like this, and in this case, anyone who looks at this code now knows that this is a Map that’s sorted by value. That’s pretty cool.

And then you can also take this a step further and make that function signature look like this:

MapSortedByValueDesc[Word, Count]

Whoa, that’s extremely cool. With these new names, anyone who walks in off the street can tell exactly what’s going on with that Map. And this can all be done with Scala.

As I discuss in the Scala Cookbook (2nd Edition), this sort of naming process is used in Domain-Driven Design, and it makes your types more meaningful.

This also makes for an interesting question: If you walked into someone’s code base, which of these options would you rather see for this particular problem, and why:

  • Map[String, Int]
  • VectorMap[String, Int]
  • MapSortedByValueDesc[String, Int]
  • MapSortedByValueDesc[Word, Count]

The answer is entirely up to you, BUT, I encourage you to form an opinion about it. You can also show it to some fellow developers and ask them what they’d rather see, and why.

All done!

At this point we’re done. Well, at least we’re done sketching out the pure functions we need to solve this problem. All we have to do now is to write the algorithms for the bodies of those functions.

Up until now was the time to think about function signatures. Now is the time to think about how to implement those algorithms.

I’d say that I’ll leave that as “an exercise for the reader,” but you can find those functions implemented in this book’s Github repository.