Been there, hacked that...

Monday, 18 January 2016

Scala vs. Go TCP Benchmark - why Scala scores so much higher (NOT)

UPDATE: A few days after posting my findings in the comments section of the article, the post was deleted. Click here to view view it at webarchive.org.

The claim: Scala is almost 7 times faster in TCP ping-pong

Some colleagues recently brought this benchmark to my attention, which compares Scala and Go performance over TCP. The authors claim that there's a six-fold performance improvement when using Scala to exchange ping-pong messages over TCP! The scenario goes like this:

a server process is listening for TCP connections on a port
a client process is fired up, which opens 100 TCP connections to the server
each of these connections is used so that the two exchange 10,000 pairs of 4-byte messages
the client sends the 4 bytes "Ping" and as soon as the server receives them, it replies with the 4 bytes "Pong"

Apparently the authors got an average time per ping-pong roundtrip of about 1.6 micros for Scala and 11 micros for Go. To quote:

To our surprise, Scala did quite a bit better than Go, with an average round trip time of ~1.6 microseconds (0.0016 milliseconds) vs. Go’s ~11 microseconds (0.011 milliseconds).

This claim seemed absurd to me. However, this is an old post, and there were 38 comments below it. I read through them and none of them seemed to explain what's happening. Some mentioned memory use due to the 4-byte array being recreated (with such a small footprint I doubt it's an issue). Some mentioned differing socket options/buffer-sizes and one in particular mentioned Naggle's algorithm, but I'm not convinced either (the benchmark alternates between reading/writing, so Naggle's algorithm will NOT get a chance to delay transmission). Some mentioned buffering in user space (e.g. use buffio package in Go) but again that can't be it (the test alternates between reading and writing just FOUR bytes).

In short, none of this made sense, so I decided to investigate.

Verifying the results

Naturally, the first thing I did was run the benchmark myself in disbelief. The process is like this: you launch the server which just sits there waiting for connections forever. You then launch the client, which creates the 100 connections, does the 10,000 ping-pongs per connection, and finally prints out an estimate how many microseconds each ping-pong took. The way it does this is by measuring the entire duration from start to end, and then dividing by 1 million, which is the total number of ping-pongs (100 times 10,000).

I should note that I ran all this on a late 2013 MacBook Pro, which has 16GB of RAM and a quad-core i7-4850HQ with hyper-threading. The software used was Scala 2.11.7 on Java 1.8.0_51, and Go 1.5.2. I measured 3 times for each of them (Scala/Go) and picked the best of 3 reported times to publish here. The results are:

Scala	Go
1.742	9.887

Ouch! Go appears to be slower by a factor of 5.675 (that's 9.887 divide by 1.742)! I say appears because I simply can not believe this. Something has got to be wrong here. The next thing I did, was measure using mixed implementations, which one of the comments noted were unexpected. The "mixed implementation" test is basically to use the Scala client against the Go server and vice-versa.

UNIMPORTANT SIDE NOTE: Before I show the mixed implementation test results for my machine, I should mention that I made a minor change in the Scala code to set the connection backlog size to 200. This was the only way I could get the Go client to reliably connect to the Scala server. For more information see the documentation at which explains that the default is 50 concurrent connection initiations, before rejections start happening. This has no significant effect on the measurements. You can repeat them yourself (with and without this change) and you will see that they yield identical results. So here is the code I used for the Scala server in this second experiment, with the comment BACKLOG CHANGE marking the line I modified:

//SERVER
import java.net._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent._
 
object main{
 
    def handleClient(s: Socket) : Unit = {
      val in = s.getInputStream
      val out = s.getOutputStream
      while(s.isConnected){
        val buffer = Array[Byte](4)
        in.read(buffer)
        out.write("Pong".getBytes)
      }
    }
 
    def main(args: Array[String]){
      val server = new ServerSocket(1201, 200)  // BACKLOG CHANGE
      while(true){
        val s: Socket = server.accept()
        Future { handleClient(s) }
      }
    }
}

Anyway, let's get back to the mixed implementation results. I ran the servers and clients with crossed implementations and here's what I got:

Go server - Scala client	Scala server - Go client
3.546	149.671

According to the results of my second experiment, the Scala client was half as good (3.546/1.742 is about 2) when running against a Go server. So it looks like the Go server is so bad that it will halve the speed with which the client can run. Let's say you believe that. Explain to me then why the Go client scored an astonishingly bad 149.671 when running against the "faster" Scala server? Since the server is so much faster, it should at least preserve the performance of the Go client, or even improve it (if the bottleneck is really the Go server)! This does not make sense.

Proving that something is broken on the Scala side

At this point I was very suspicious of the Scala code. One of the comments mentioned that there is a bug in the Scala code which defines a buffer variable of size 1 instead of 4, so is unable to read the entire "Ping/Pong" message content. This was dismissed on the ground of "we always send an entire Pong anyway" which didn't convince me either.

I decided to modify the source a bit to actually count and report the number of bytes read by the server. See the comments ADDITION below for my modifications. Basically I add a global counter which gets incremented after every read in a thread-safe manner. The number of bytes actually read is returned by the read() method.

//SERVER
import java.net._
import java.util.concurrent.atomic.AtomicInteger            //ADDITION
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent._
 
object main{
    val totalRead = new AtomicInteger(0)

    def handleClient(s: Socket) : Unit = {
      val in = s.getInputStream
      val out = s.getOutputStream
      while(s.isConnected){
        val buffer = Array[Byte](4)
        val actuallyRead = in.read(buffer)
        if (actuallyRead > 0)                   // ADDITION
          totalRead.addAndGet(actuallyRead)
        out.write("Pong".getBytes)
      }
    }
 
    def main(args: Array[String]){
      val server = new ServerSocket(1201, 200)  // BACKLOG CHANGE
      var client = 0
      while(true){
        val s: Socket = server.accept()
        client += 1
        val clientId = client
        val f = Future { handleClient(s) } 
        f onComplete {                          // ADDITION
          case _ => println("After client %d, total bytes read: %d".format(clientId, totalRead.get))
        }
      }
    }
}

Now, before you run this, consider for a moment what you'd expect. You have 100 connections, with 10,000 pings coming in from each one. Since each ping is 4 bytes long, you'd expect 40,000 bytes to be read from each connection, for a total of 4,000,000 bytes across all clients. Now fire up the Scala server and then the Scala client. Here's what I saw in my output:

...
After client 98, total bytes read: 996120
After client 99, total bytes read: 999428
After client 100, total bytes read: 1000921

So basically the server has only done a quarter of the expected work in terms of receiving data. We have silently lost about three quarters of the incoming bytes! What's the deal here? Does the client never send them? Or does the server never read them? Well, according to Java's documentation for the read() method, an attempt is made to read at least one byte. It blocks until at least one byte is available, and then reads as many bytes as possible, trying to fill the buffer provided as a parameter. So let's run the Go client against the same Scala server and see what happens:

...
After client 99, total bytes read: 3994164
After client 100, total bytes read: 3994164
After client 90, total bytes read: 3994164
After client 98, total bytes read: 3994164
After client 95, total bytes read: 3994164
After client 73, total bytes read: 3994164
After client 85, total bytes read: 3994164

We get almost all our bytes now, but not quite all of them. But evidently even though the Scala server is perfectly capable of receiving almost all of the data (even if it reads it one byte at a time) we can see that in practice there's something in the Scala client that causes it to stop after reading about a quarter of the stream!

Now, we can see from the server loop that as long as the socket is connected, the server will keep reading incoming data. So it just needs the client to stick around long enough with the connection open: the Scala server will eventually read through all the bytes. We can see that the Go client stays connected long enough to exchange all data, so the problem must be that the Scala client does not stick around to do the entire exchange! If that is the case, the number the Scala client reports must be wrong!

Let's just fix things and do a proper comparison

At this point I decided to simply change both the Scala client and the Scala server to use 4-byte buffers. I'm not so much worried about read() returning less than 4 bytes (as some comments suggested) because we are doing loopback communication here with 4-byte payloads: there's no chance these will be split across multiple TCP segments. I expect read() to always return 4 bytes as long as the buffer is at least that big. So here's the Scala code:

//SERVER
import java.net._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent._
 
object main{
 
    def handleClient(s: Socket) : Unit = {
      val in = s.getInputStream
      val out = s.getOutputStream
      while(s.isConnected){
        val buffer = Array[Byte](0, 0, 0, 0)        // BUG FIX
        in.read(buffer)
        out.write("Pong".getBytes)
      }
    }
 
    def main(args: Array[String]){
      val server = new ServerSocket(1201, 200)      // BACKLOG CHANGE
      while(true){
        val s: Socket = server.accept()
        Future { handleClient(s) }
      }
    }
}

And here is the fixed client:

//CLIENT
import scala.concurrent._
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.language.postfixOps
import java.net._
 
object main{
 
    def ping(timesToPing: Int) : Unit = {
        val socket = new Socket("localhost", 1201)
        val out = socket.getOutputStream
        val in = socket.getInputStream
        for (i <- 0 until timesToPing) {
            out.write("Ping".getBytes)
            val buffer = Array[Byte](0, 0, 0, 0)        // BUG FIX
            in.read(buffer)
        }
        socket.close
    }
 
    def main(args: Array[String]){
        var totalPings = 1000000
        var concurrentConnections = 100
        var pingsPerConnection : Int = totalPings/concurrentConnections
        var actualTotalPings : Int = pingsPerConnection*concurrentConnections
 
        val t0 = (System.currentTimeMillis()).toDouble
        var futures = (0 until concurrentConnections).map{_ => 
            Future(ping(pingsPerConnection))
        }
 
        Await.result(Future.sequence(futures), 1 minutes)
        val t1 = (System.currentTimeMillis()).toDouble
        println(1000*(t1-t0)/actualTotalPings)
    }
}

I now re-ran the benchmark tests and here are the results:

Scala	Go	Go server - Scala client	Scala server - Go client
9.622	9.887	11.621	11.252

Suddenly the difference is negligible, less than 1 microsecond! We just need to make sure we're comparing the same thing! If you want, you can run the version of the Scala server that logs the number of bytes exchanged. You will then see that it is exactly 4,000,000 and that happens for both the Go and the Scala clients.

Explaining the impact of the 1-byte buffer

But why does this buffer size makes such a huge difference? The reason is that when the Scala server receives the first ping, it can immediately loop 4 times and send 4 pongs to the client, without having to wait for another ping from the Scala client. Similarly, when the Scala client receives the first pong it can immediately loop 4 times and send 4 more pings, without having to wait for the server to send another pong from the Scala server. Note that this happens for each message: whenever the client sends a ping, the server replies with 4 pongs. Similarly, whenever the client receives one of those pongs, it sends out 4 new pings instead of one. This means that the client and the server can be running at the same time rather than waiting for each other, due to the ever-increasing size of the burst. On the other hand, the Go versions empty the incoming buffer on both sides after each read. This means that when they loop to do the next read, they must wait for their counterparty to receive and send the next message. In effect, in the Go versions, there is never both a client and a server ready to run at the same time on different CPU cores! One always ends up in WAITING state so that the other can run and provide the data it is waiting for. They are running in lock-step, alternating from one to the other!

Another interesting dynamic here is the kernel-space buffering present in TCP. The operating system has a kernel-space buffer for receiving TCP data, which in the Scala case is combined with user-space buffering that happens in the JVM. The reason for the user-space buffering is due to the fact that Java's java.net.Socket.getInputStream() returns a buffered implementation. The reason this is important, is because system calls are expensive. System calls are the way the operating system kernel communicates and exchanges data with user processes. When a process has to fetch incoming data for the TCP socket it executes a system call. On a x86 architecture, this means that a software interrupt must happen, causing a jump to the interrupt hander the kernel has installed, with the CPU switching from protected to real mode, which then saves process state and starts copying memory between kernel and user space and so on... The JVM wants to avoid doing this as much as possible so it will create a large buffer in user-space to exploit each system call to the maximum. Calls to read() from the InputStream use the user-space buffer until it is depleted, at which point another expensive system call is made. The Go version does not have user-space buffering, but as you will soon see (read on) it would not have made a difference anyway.

So back to the system calls and the buffering in Scala: when the Scala bug is present, after the first ping is sent by its client, the server makes a system call and receives 4 bytes in user space. It them immediately loops 4 times and sends 4 pongs worth 16 bytes to the client in a burst (due to the bug which it reads one byte at a time from the ping which is already in the user-space buffer). Now, the client is waiting for a pong and when its read() executes it gets 4 of them! These 16 bytes are moved to user space by the InputStream in a SINGLE system call, allowing the client to then loop SIXTEEN times without waiting or doing a system call and send 16 new 4-byte pings worth 256 bytes! This effect is amplified by a factor of 4 with every iteration of the Scala code. So the first system call of the server gets 4 bytes, the next one 64, the next one 1024, and so on. Similarly, the client side gets 4 bytes, then 16, then 256, then 4096, and so on! Scala makes far fewer expensive system calls, as more and more data is moved across kernel-space and user-space with each one of them. Meanwhile, the Go code sends and receives all 4 bytes each time, draining the buffers altogether, and doing exactly 10,000 expensive system calls on each side, as intended. In fact, after each iteration, both server and client WAIT until the data is copied by the operating system across the TCP connection.

I should note however, that even if buffered I/O was used with GO it would not make a difference: each party waits for the entire ping (server) or pong (client) message to arrive, before sending its reply. So even if there was user-space buffering present in Go, it would never get a chance to carry more than 4 bytes over from kernel space to user space, because there would never be more that 4 bytes available! In the Go version (unlike the buggy Scala version), always has exactly 4 bytes in transit in either direction! The fact that user space buffering isn't important becomes evident when we fix the Scala code: there's no significant performance advantage. The user-space buffering is only important if there's more data travelling on the TCP channel than what the client side asks to read (which was the case with single-byte buffers in Scala).

Executive Summary

The reason for the perceived faster Scala implementation was the single-byte buffer. This had the following effects:

The Scala client sent bursts of increasing size (1, 16, 256 and so on) of pings to the server, which replied in a similar fashion (4, 64, 1024 and so on) with pong bursts. This allowed the two to run concurrently, instead of waiting for each other. In contrast, the Go versions were running in alternating fashion, with each client and server go-routine switching between READY and WAITING state.
Not only was the Scala client never switching to WAITING state during its 10,000 iterations, it also stopped immediately afterward reading just 10,000 bytes from the socket instead of 40,000 (because each iteration reads just one byte). At that point it would STOP and repor the time it took to read 10,000 bytes from each connection.
During that time the user-space buffering of the Scala client was combined with the increased size of each burst, causing Scala to make far fewer system calls for copying data between user space and kernel space buffers. The Go version would make exactly 10,000 system calls for reading on each side, whereas the Scala version probably made less than 10!

As soon as the Scala version is fixed, it immediately drops to performance comparable to Go. The difference of less than 1 microsecond on my machine is negligible.

Concluding remarks

This "benchmark" is really about stress-testing the underlying operating system. It says nothing about Go or Scala. You could write the same code in any half-way decent language (Java, C++, Python, Ruby, etc) and get almost identical results (that is, as long as the two implementations are really doing the same thing).

The language is not really important in this case. It's the underlying OS that is doing most of the work here. Our processes spend 99% of their time either waiting for the I/O to complete, or in kernel-space, executing Operating System code for transferring data. That code is NOT the code that we wrote, but rather the code of our OS which was most likely written in C or C++, and is common no matter what language you used to write the benchmark.

The only thing this can be used for is to take an implementation of the test (the proper bug-free Scala one, or the Go one) and run it on the same hardware but with different underlying OS (Linux, FreeBSD, OS X, Windows...) to see which TCP/IP stack works faster for "loopback" (on same host) communication.

Or perhaps you could use it to fine-tune settings for the OS to improve the performance. In this case it would simply be a tool to review what impact changes of various networking-stack-related settings of the OS would have on I/O workloads such as this one (small alternating messages).

Wednesday, 4 February 2015

SBT (Scala Build Tool): a getting started guide for veteran newbs

Go forth and build! ...my code!

Sooooo... Scala Build Tool.

Like all things with me recently, it started out by reading a book about Scala while commuting to work (riding the tube for 2 hours daily gives you a LOT of time to read). Now, this is like... the 8th (?) programming language I've learned in my life? Anyway, suffice to say, by the time I got around to feeding my OCD and writing the obligatory "Hello World" using Scala IDE for Eclipse, I probably knew more Scala than was good for me.

Anyway, I was determined to mingle with the cool kids, so I started looking at Akka's actors. Within minutes, I was overwhelmed by a desire to use Akka for re-enacting a movie scene. So here it is, from The Big Lebowski, a scene with The Dude (Akka actor Jeff Bridges) and Jackie (Akka actor Ben Gazzara):

object BigLebowski extends App {
  implicit val system = ActorSystem("the-stage")
  // our actors
  val gazzara = system.actorOf(Props[BenGazzara])
  val bridges = system.actorOf(Props[JeffBridges])

  // the roles
  val jackie = new Movie.Role("Jackie Treehorn")
  val dude = new Movie.Role("The Dude")

  // the script for our scene
  val script = Movie.Script("BigLebowski-Scene-1",
    // gazzara plays jackie, and bridges plays "the dude"
    Map(gazzara -> jackie, bridges -> dude),
    // here's the scene's dialogue:
    (jackie, "Interactive erotic software. The wave of the future, Dude. " +
      "One hundred percent electronic!"),
    (dude, "Yeah well, I still jerk off manually."))

  // director distributes script, and... Action!
  system.actorSelection("/user/*") ! script
}

Unfortunately, we had a problem. Well, two actually, but let's ignore for a moment that the cool kids think 90s movies are lame. From a technical perspective, my problem was that as soon as I ventured outside the standard library, I needed a tool to manage my build. Something that would look up and download dependencies, compile code in the proper order (using the appropriate classpath), etc. And thus the cool kids pointed me to SBT: the Scala Build Tool.

Hello, SBT

XML is so very lame, that the cool kids don't even know what that is. It is rumored an old man once tried to explain it to them, but none of them payed attention (except the few that died of boredom 30 seconds into the explanation). The net outcome was that the cool kids never knew the joys of Maven and Ant, but at least they lived to write tools like Gradle and SBT. But this is all beside the point. The point is, you wanna be jaw-dropping-awesome. And here's how you do it:

version := "0.0.1"

name := "big-lebowski-movie"

scalaVersion  := "2.11.4"

resolvers += Opts.resolver.mavenLocalFile

resolvers += "Typesafe Releases" at "http://repo.typesafe.com/typesafe/releases/"

libraryDependencies ++= {
  val akkaV = "2.3.8"
  Seq(
    "com.typesafe.akka"          %%  "akka-actor"              % akkaV
  )
}

The file above is called build.sbt and should be placed at the root folder of your project (by the way, no need to copy-paste: you can fetch the entire code from my github repository, here). You should have a look over there now. Ignore for a moment the project folder and notice how your Scala sources should reside in relative path src/main/scala. So basically, just add that build.sbt and you're ready to rock. But let's go over it now line-by-line, to see what it all means.

SBT's model and API

The first thing you should know about this file is that the content is actually regular Scala: there is no such thing as some special SBT language that you must learn to define your project build. The theory goes... if you know Scala you already know how to write a build.sbt file. But that's only half the truth really. It is half the truth because there is such a thing as an SBT library, which defines a Scala API for describing your build. So to write a build.sbt you need to know Scala and also the SBT library API.

Which brings us to the second thing you need to know about build.sbt: the following lines are implied to exist at the very beginning of the build.sbt file:

import sbt._
import Process._
import Keys._

// actual content of build.sbt follows here

Now, these statements pull into scope that magic SBT library API so you can use things like the := operator prevalent in the first few lines, the %% and % methods used in the line close to the bottom, etc. All of these are actually methods defined by the SBT library and implicitly brought in scope using the implied import lines above. We next go over each line and discuss the SBT API being used.

It's all keys and settings

Here's an inconvenient truth: assume you could disregard for a moment all the coolness of SBT, what do you think you'd be left with? It may come as a surprise, but you'd actually find yourself thinking of a Map data structure. Yes, I mean that simple data structure that associates keys with values. But in the case of SBT, we have several pre-defined keys with special meaning, and values that are instances of the Scala class Setting[T] (defined by the SBT library). So here's that file again, this time with enlightening comments:

// the key "version" is assigned a Setting[String] value of "0.0.1"
version := "0.0.1"

// the key "name" is assigned a Setting[String] value of "big-lebowski-movie"
name := "big-lebowski-movie"

// the key "scalaVersion" is assigned a Setting[String] value of "2.11.4"
scalaVersion  := "2.11.4"

// we'll discuss the rest later

As you can see, each line associates a key with a value. The keys used -- version, name, scalaVersion -- are not random. SBT defines these keys in its library and uses their assigned values for specific purposes during the build. You can probably guess that name and version refer to your project and what you want to call it. They are used for things like the name of the JAR file generated when you package our project. The key scalaVersion on the other hand tells SBT what version of Scala to use when compiling your code. The := is actually a method that is defined on SBT's Key class, which accepts a single argument and assigns a Setting[T] value to the key. Therefore the statement name := "big-lebowski-movie" is in fact the Scala expression name.:=("big-lebowski-movie"), assigning a Setting[String] value to the key name. No magic there...

We now discuss the special key resolvers, along with the += method. We have invoked resolvers.+=(...) twice:

// this tells SBT to look for dependencies in your local maven repository
resolvers += Opts.resolver.mavenLocalFile
// above line is actually resolvers.+=(Opts.resolver.mavenLocalFile)

// this tells SBT to look for dependencies in TypeSafe's Ivy repository
// at https://repo.typesafe.com/typesafe/ivy-releases
resolvers += Resolver.typesafeIvyRepo("releases")
// above line is actually resolvers.+=(Resolver.typesafeIvyRepo("releases"))

The special key resolvers tells SBT where to look for library dependencies needed to build your projects. The same way the := method replaces a key's Setting value with a new one, the += appends an extra value to the existing one. So in this case, we add two sources for resolving dependencies, namely our local maven repository and TypeSafe's Ivy repository (the latest releases of Akka are there). The parameters used in each case are also from SBT's library. There really is a Resolver.typesafeIvyRepo(s:String) defined among others in there. You can see a list of these predefined settings in the SBT documentaion.

And the final key we used is libraryDependencies, whose value is a Setting[Seq[ModuleID]], where ModuleID is an SBT class representing a dependency:

// libraryDependencies.++=(...)
libraryDependencies ++= {
  // the curly braces are simply a Scala block of code. therefore the block's
  // value is equal to the last expression in the block

  // but first, we assign a value variable
  val akkaV = "2.3.8"

  // try adding this line and running sbt again:
  // println("YES, IT'S REALLY JUST SCALA CODE!")

  // this is the last expression of the code block.
  // as such, the entire block evaluates to the statement below
  Seq(
    // the three strings below are the organization, name and revision respectively
    // the %% and % methods create a ModuleID object from the 3 strings
    "com.typesafe.akka"          %%  "akka-actor"              % akkaV
    // so the net result is a Seq[ModuleID], with a single element in it
  )
}

The ++= method works on keys whose value is a sequence of something (i.e. Setting[Seq[T]] and is used to append a sequence of elements to the existing value. In our case we append a sequence with a single dependency (akka-actor).

The %% and % methods are used to construct a ModuleID object using three String values. Notice how we use the variable akkaV for the revision. In an actual use case where we might pull in several Akka modules, keeping the version in a single place will make it easy to change to another version.

Blank lines of coolness

The only open point right now should be the blank lines. If you look back at our original file, you will see that each key-setting definition is followed by one (blank line). This is NOT a coincidence. In fact, if you remove one of them, you will see SBT complaining while parsing the file. The reason for this is that SBT expects build.sbt to contain a list of key-setting mapping expressions (as opposed to statements). Therefore some separator needs to be used. The newline character happens to be what SBT chose as a delimiter. You can read all about it here.

Conclusion

A few last words. Like other build tools, SBT is extensible. Several plug-ins exist that extend its capabilities. One way to add plugins is by using the project/plugins.sbt file, which we use to install a plugin called sbt-assembly:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")

This adds the capability to assemble a fat JAR with all of your code and its dependencies. If you run sbt assembly you will find a target/scala-2.11/big-lebowski-movie-assembly-0.0.1.jar archive including everything needed by your project. More information can be found on this plugin's web page.

Another plugin you may find useful if you use Scala IDE for Eclipse is the sbteclipse-plugin. This will generate Eclipse project files for your code so that you may work with it in Eclipse. You just need to run sbt eclipse and the required files (.project and .classpath) will be generated. All you need is to add the plugin:

addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "3.0.0")

I hope you found this post useful.