Why calling Source.mkString() is a very bad idea

Shai Yallin | April 11th 2013 | Scala

So we have some Scala code that consumes text from an InputStream from an HTTP response.  So, like any good Scala developer, I handed over the response to a function that returns a String. This function performs some validations such as checking that the response status is 200, then consumes the InputStream. This was done by creating a scala.io.Source from the InputStream, then calling source.mkString to consume the response body.

Or so I thought.

Apparently, scala.io.Source is an Iterator[Char] and inherits the mkString() function from TraversableOnce. Calling a TraversableOnce.mkString() appends all members of the TraversableOnce instance to a StringBuilder – which is fine, unless this instance is actually an abstraction over an IO-bound stream, whereupon it consumes the stream byte-by-byte. This is, as some of you might now, a terribly inefficient way to consume an InputStream, especially a network-bound one. When running a thread profiler, we were horrified to discover that we are spending almost 70% of the time waiting on IO consuming these HTTP responses.

The solution?

Source.fromInputStream(response.getInputStream, "UTF-8").getLines.mkString

Test code I used to prove the problem:

package com.wixpress.scala

import java.io.ByteArrayInputStream
import scala.io.Source

object SourceBenchmark {

  def main(args: Array[String]) {
    val bytes = new Array[Byte](1024 * 1024)
    val is = new ByteArrayInputStream(bytes)

    val mkStringTime = measureInNanos {
      Source.fromInputStream(is).mkString
    }

    val getLinesTime = measureInNanos {
      Source.fromInputStream(is).getLines().mkString
    }

    println("Time with mkString: %s, time with getLines: %s".format(mkStringTime, getLinesTime))
  }

  def measureInNanos(f: => Unit) = {
    val before = System.nanoTime()
    f
    System.nanoTime() - before
  }
}

This output (from my late 2011 MacBook Pro):

Time with mkString: 548707000, time with getLines: 1868000

This is a difference of about x600 in favor of the version with getLines()!

Edit: I have created an issue in the Scala issue tracker for this bug.


Tags: ,
By Shai Yallin
A seasoned software engineer focusing on JVM-based languages like Java and Scala; An avid advocate of clean code, continuous delivery and TDD.
Wix

Leave a Reply

2 comments

We are always looking for excellent people. Browse Jobs Here   

At Wix Engineering we develop some of the most innovative cloud-based web applications that influence our 80+ million users worldwide.

Have any questions? Email academy@wix.com.

Find out what’s coming up at Wix Engineering:

Subscribe to our newsletter for updates, events and more.