Thursday, February 14, 2013

String Interpolation in Scala 2.10

I spoke at a recent ScalaSyd meeting about string interpolation in the Scala 2.10 release. You can find the slides here, but since there is little in the way of explanation in them, I’ll be blogging here to explain the examples.

I begin in this post with the basics of string interpolation via processed strings in Scala 2.10. Later posts show how to write your own interpolators that implement custom value construction as well as pattern matching. The final posts show how macros can be used to get compile-time guarantees about the content of your processed strings.

The complete code for all examples in this series will be made available in the accompanying BitBucket project.

The story so far

Scala from before 2.10 supports basic string literals delimited by single quotes. Single-quoted literals can contain escape sequences but not newlines.

"A string on one line"

A triple-quoted form is also available in which non-Unicode escape sequences are not interpreted and newlines can occur.

"""A long string with only Unicode escapes and

possibly newlines in it"""

Commonly used techniques for constructing strings include using the plus operator to concatenate the pieces.

"The " + animal1 + " jumped over the " + animal2

Fans of printf-style string formatting can use the format method to build their strings.

"The %s jumped over the %s".format (animal1, animal2)

Processed strings

Scala 2.10 extended the syntax by adding processed strings that allow expression values to be interpolated (inserted) into the middle of a string literal.

Interpolation is requested by prefixing a string literal with an identifier. There must be nothing between the identifier and the literal. The examples I show use the single-quoted form of string literal, but tripled-quoted strings can also be processed in an analogous fashion.

Processed strings can include arbitrary expressions marked by dollar signs. The way in which the dollar-marked expressions are processed depends on the details of interpolator.

For example, the Scala library provides an interpolator called s that can be used as follows.

val answer = 42
println (s"answer is $answer, dollar is $$")

val animal1 = "fox"
val animal2 = "dog"
println (s"The $animal1 jumped over the $animal2")

println (s"One plus one is ${1 + 1}")

println (s"The inserted expressions are blocks ${
  val x = "!"
  x * 3
answer is 42, dollar is $
The fox jumped over the dog
One plus one is 2
The inserted expressions are blocks !!!

The s interpolator produces a string value by concatenating the constant parts of the string literal (after interpreting escape sequences) with the values of the embedded expressions. A dollar sign is obtained in the output by including two consecutive dollar signs in the literal. If the expression marked by a dollar sign is more than a single identifier it must be enclosed in braces. The final example shows that the expressions are actually block expressions so they can contain local declarations.

The Scala library also contains an interpolator called raw that behaves just like s except that it doesn’t interpret escape sequences in the constant parts of the string literal.

It is important to realise that the expressions embedded in a processed string are checked in the usual way by the Scala compiler. Errors will be reported at compile time.

Formatted interpolation

The string interpolation equivalent of the format method is provided by the Scala library’s f interpolator.

val pi = 3.14159
println (f"pi ($pi) = $pi%1.3f")

val msg = "G'day!"
println (f"msg.length   = ${msg.length}%5d")
pi (3.14159) = 3.142
msg.length   =     6

The difference between s/raw and f is that in an f string the embedded expressions can be followed by format specifiers beginning with a percent sign. If no format specifier is given for an expression then it defaults to %s so the string value of the expression is used.

An interesting aspect of f is that the interpolation process checks compatibility between the embedded expressions and the format specifiers at compile time. For example, if msg is a string, the following will not compile since a string cannot be formatted as a floating-point value.

f"msg can't be formatted as $msg%1.3f"

This kind of checking goes above and beyond the normal checking that the compiler will do. In the example, the compiler will normally ensure that msg is in scope of its use in the string. The extra checking to make sure that msg is compatible with %1.3f is performed by the interpolation process. Since we want the extra checking to be performed at compile time, it is implemented by a macro that augments the compiler’s capabilities. I’ll show some examples of using macros for this kind of checking later in this series.

What’s next?

As you might expect, the interpolators s, raw and f are not particularly special. It’s easy to write your own and in the next post I’ll show you how.

No comments: