Tom's Programming Blog

Monday, 27 February 2012

Guava Splitter vs StringUtils

So I recently wrote a post about good old reliable Apache Commons StringUtils, which provoked a couple of comments, one of which was that Google Guava provides better mechanisms for joining and splitting Strings. I have to admit, this is a corner of Guava I've yet to explore. So thought I ought to take a closer look, and compare with StringUtils, and I have to admit I was surprised at what I found.

Splitting strings eh? There can't be many different ways of doing this surely?

Well Guava and StringUtils do take a sylisticly different approach. Lets start with the basic usage.

// Apache StringUtils...
String[] tokens1 = StringUtils.split("one,two,three",',');

// Guava splitter...
Iterable<String> tokens2 = Splitter.on(',').split("one,two,three");

So, my first observation is that Splitter is more object orientated. You have to create a splitter object, which you then use to do the splitting. Whereas the StringUtils splitter methods uses a more functional style, with static methods.

Here I much prefer Splitter. Need a reusable splitter that splits comma separated lists? A splitter that also trims leading and trailing white space, and ignores empty elements? Not a problem:

Splitter niceCommaSplitter = Splitter.on(',')
                              .omitEmptyString()
                              .trimResults();

niceCommaSplitter.split("one,, two,  three"); //"one","two","three"
niceCommaSplitter.split("  four  ,  five  "); //"four","five"

That looks really useful, any other differences?

The other thing to notice is that Splitter returns an Iterable<String>, whereas StringUtils.split returns a String array.

Don't really see that making much of a difference, most of the time I just want to loop through the tokens in order anyway!

I also didn't think it was a big deal, until I examined the performance of the two approaches. To do this I tried running the following code:

final String numberList = "One,Two,Three,Four,Five,Six,Seven,Eight,Nine,Ten";

long start = System.currentTimeMillis();  
for(int i=0; i<1000000; i++) {
    StringUtils.split(numberList , ',');   
}
System.out.println(System.currentTimeMillis() - start);
  
start = System.currentTimeMillis();
for(int i=0; i<1000000; i++) {
    Splitter.on(',').split(numberList );
}
System.out.println(System.currentTimeMillis() - start);

On my machine this output the following times:

594
31

Guava's Splitter is almost 10 times faster!

Now this is a much bigger difference than I was expecting, Splitter is over 10 times faster than StringUtils. How can this be? Well, I suspect it's something to do with the return type. Splitter returns an Iterable<String>, whereas StringUtils.split gives you an array of Strings! So Splitter doesn't actually need to create new String objects.

It's also worth noting you can cache your Splitter object, which results in an even faster runtime.

Blimey, end of argument? Guava's Splitter wins every time?

Hold on a second. This isn't quite the full story. Notice we're not actually doing anything with the result of the Strings? Like I mentioned, it looks like the Splitter isn't actually creating any new Strings. I suspect it's actually deferring this to the Iterator object it returns.

So can we test this?

Sure thing. Here's some code to repeatedly check the lengths of the generated substrings:

final String numberList = "One,Two,Three,Four,Five,Six,Seven,Eight,Nine,Ten";
long start = System.currentTimeMillis();  
for(int i=0; i<1000000; i++) {
  final String[] numbers = StringUtils.split(numberList, ',');
    for(String number : numbers) {
      number.length();
    }
  }
System.out.println(System.currentTimeMillis() - start);
  
Splitter splitter = Splitter.on(',');
start = System.currentTimeMillis();
for(int i=0; i<1000000; i++) {
  Iterable<String> numbers = splitter.split(numberList);
    for(String number : numbers) {
      number.length();
    }
  }
System.out.println(System.currentTimeMillis() - start);

On my machine this outputs:


609

2048

Guava's Splitter is almost 4 times slower!

Indeed, I was expecting them to be about the same, or maybe Guava slightly faster, so this is another surprising result. Looks like by returning an Iterable, Splitter is trading immediate gains, for longer term pain. There's also a moral here about making sure performance tests are actually testing something useful.

In conclusion I think I'll still use Splitter most of the time. On small lists the difference in performance is going to be negligible, and Splitter just feels much nicer to use. Still I was surprised by the result, and if you're splitting lots of Strings and performance is an issue, it might be worth considering switching back to Commons StringUtils.

Monday, 6 February 2012

Useful SVN commands

Usefull SVN commands

Have been increasingly using command line SVN these days. Find it's just a bit quicker and more reliable than the GUI clients I had been using. I've mainly written this for myself, as a quick reference for the commands I use most frequently

svn help COMMAND

Displays help for a particular svn command.
Lists all available commands if none is specified

svn co URL[@REV]

Check out: Creates a local working copy of the repository found at the URL

svn log

Provides a log of commit messages
Use with -l 10 to limit to 10 most recent messages
Use -v for verbose output (lists changed files)
Use -r 12345 to get info on particular revision

svn info

Provides useful information about the current working copy, such as repository URL, and current revision

svn status

Provides a list of diffences between the working copy, and the repostory. Take a look at the help (svn help status to find out what the different column values mean)

svn diff

Displays local modifications
Use -r N:M to display differences between two revisions
Use -c to see the changes for a particular revision

svn revert PATH

Reverts the specified path to the contents of the repository. You will lost any local changes!
Not recursive by default, use -R to make recursive

svn add PATH

Adds the specified path the version control.
Note, this doesn't add the file to the repository yet, that doesn't happen till you commit.

svn copy SRC[@REV] DEST

Copies something from SRC to DEST.
SRC and DEST can both be either working copies paths, or URLS.
Usual usage would be WC -> WC, or URL -> URL.
URL -> URL is used for branching and tagging

svn delete TARGET

Deletes a file.
If TARGET is a working copy path, the file is scheduled for deletion on the next commit
If TARGET is a repository URL, it immediately deletes the file from the repository.

svn move SRC DST

Moves a the specified target.
Equivalent of a copy then a delete
Maintains history on the moved file.

svn commit -m MESSAGE

Commits the working copy changes to the repository

svn list TARGET

Provied a directory listing for the specified folder in the repository

svn mkdir TARGET

Creates a directory
TARGET can be working copy path or repository URL

svn merge -r N:M SOURCE@REV

Merges the range of revisions starting at N and ending at M from SOURCE into the current working copy
If N > M then it is a reverse merge, and can be used to undo the differences between N and M
Typically used to catch up a feature branch
Can use -c option to pick a single revision
Can supply multiple -c and -r options to cherry pick revisions

svn merge --reintegrate SOURCE@REV

Used to reintegrate a branch into it's parent branch
Working copy should be the parent branch (often trunk)
A branch cannot be reintegrated twice, so good practice to delete the branch afterwads

svn blame TARGET@REV

Outputs target, with author names, and revision numbers attached to changes
Use to see who broke what.

Tuesday, 31 January 2012

Apache Commons Lang StringUtils

So, thought it'd be good to talk about another Java library that I like. It's been around for a while and is not perhaps the most exciting library, but it is very very useful. I probably make use of it daily.

org.apache.commons.lang.StringUtils

StringUtils is part of Apache Commons Lang (http://commons.apache.org/lang/, and as the name suggest it provides some nice utilities for dealing with Strings, going beyond what is offered in java.lang.String. It consists of over 50 static methods, and I'm not going to cover every single one of them, just a selection of methods that I make the most use of.

There are two different versions available, the newer org.apache.commons.lang3.StringUtils and the older org.apache.commons.lang.StringUtils. There are not really any significant differences between the two. lang3.StringUtils requires Java 5.0 and is probably the version you'll want to use.

public static boolean equals(CharSequence str1, CharSequence str2)

Thought I'd start with one of the most straight forward methods. equals. This does exactly what you'd expect, it takes two Strings and returns true if they are identical, or false if they're not.

But java.lang.String already has a perfectly good equals method? Why on earth would I want to use a third party implementation?

It's a fair question. Let's look at some code, can you see any problems?

public void doStuffWithString(String stringParam) {
    if(stringParam.equals("MyStringValue")) {
        // do stuff
    }
}

That's a NullPointerException waiting to happen!

There are a couple of ways around this:

public void safeDoStuffWithString1(String stringParam) {
    if(stringParam != null && stringParam.equals("MyStringValue")) {
        // do stuff
    }
}

public void safeDoStuffWithString2(String stringParm) {
    if("MyStringValue".equals(stringParam)) {
        // do stuff
    }
}

Personally I'm not a fan of either method. I think null checks pollute code, and to me "MyStringValue".equals(stringParam) just doesn't scan well, it looks wrong.

This is where StringUtils.equals comes in handy, it's null safe. It doesn't matter what you pass it, it won't NullPointer on you! So you could rewrite the simple method as follows:

public void safeDoStuffWithString3(String stringParam) {
    if(StringUtils.equals(stringParam,"MyStringValue)) {
        // do stuff
    }
}

It's personal preference, but I think this reads better than the first two examples. There's nothing wrong with them, but I do think StringUtils.equals() is worth considering.

isEmpty, isNotEmpty, isBlank, isNotBlank

OK, these look pretty self explanatory, I'm guessing they're all null safe?

You're probably spotting a pattern here. isEmpty is indeed a null safe replacement for java.lang.String.isEmpty(), and isNotEmpty is it's inverse. So no more null checks:

  if(myString != null && !myString.isEmpty()) { // urghh
     // Do stuff with myString
  }

  if(StringUtils.isNotEmpty(myString)) { // much nicer
     // Do stuff with myString
  }

So, why Blank and Empty?

There is a difference, isBlank also returns true if the String just contains whitespace, ie...

  String someWhiteSpace = "    \t  \n";
  StringUtils.isEmpty(someWhiteSpace); // false
  StringUtils.isBlank(someWhiteSpace); // true

public static String[] split(String str, String separatorChars)

Right that looks just like String.split(), so this is just a null safe version of the built in Java method?

Well, yes it certainly is null safe. Trying to split a null string results in null, and a null separator splits on whitespace. But there is another reason you should consider using StringUtils.split(...), and that's the fact that java.lang.String.split takes a regular expression as a separator. For example the following may not do what you want:

public void possiblyNotWhatYouWant() {
    String contrivedExampleString = "one.two.three.four";
    String[] result = contrivedExampleString.split(".");
    System.out.println(result.length); // 0
}

But all I have to do is put a couple of backslashes in front of the '.' and it will work fine. It's not really a big deal is it?

Perhaps not, but there's one last advantage to using StringUtils.split, and that's the fact that regular expressions are expensive. In fact when I tested splitting a String on a comma (a fairly common use case in my experience), StingUtils.split runs over four times faster!

public static String join(Iterable iterable, String separator)

Ah, finally something genuinely useful!

Indeed I've never found an elegant way of concatenating strings with a separator, there's always that annoying conditional require to check if want to insert the separator or not. So it's nice there's a utility to this for me. Here's a quick example:

  String[] numbers = {"one", "two", "three"};
  StringUtils.join(numbers,",");  // returns "one,two,three"

There's also various overloaded versions of join that take Arrays, and Iterators.

Ok, I'm convinced. This looks like a pretty useful library, what else can it do?

Quite a lot, but like I said earlier I won't bother going through every single method available, I'd just end up repeating what's said in the API documentation. I'd really recommend taking a closer look: http://commons.apache.org/lang/api-3.1/org/apache/commons/lang3/StringUtils.html

So basically if you ever need to do something with a String that isn't covered by Java's core String library (and maybe even stuff that is), take a look at StringUtils.

Thursday, 10 November 2011

Implicit Conversions in Scala

Following on from the previous post on operator overloading I'm going to be looking at Implicit Conversions, and how we can combine them to with operator overloading to do some really neat things, including one way of creating a multi-parameter conversion.

So what's an "Implicit Conversion" when it's at home?

So lets start with some basic Scala syntax, if you've spent any time with Scala you've probably noticed it allows you to do things like:

   (1 to 4).foreach(println) // print out 1 2 3 4

Ever wondered how it does this? Lets make things more explicit, you could rewrite the above code as:

    val a : Int = 1
    val b : Int = 4
    val myRange : Range = a to b
    myRange.foreach(println)

Scala is creating a Range object directly from two Ints, and a method called to.

So what's going on here? Is this just a sprinkling of syntactic sugar to make writing loops easier? Is to just a keyword in like def or val?

The answers to all this is no, there's nothing special going on here. to is simply a method defined in the RichInt class, which takes a parameter and returns a Range object (specifically a subclass of Range called Inclusive). You could rewrite it as the following if you really wanted to:

   val myRange : Range = a.to(b)

Hang on though, RichInt may have a "to" method but Int certainly doesn't, in your example you're even explicitly casting your numbers to Ints

Which brings me nicely on to the subject of this post, Implicit Conversions. This is how Scala does this. Implicit Conversions are a set of methods that Scala tries to apply when it encounters an object of the wrong type being used. In the case of the to example there's a method defined and included by default that will convert Ints into RichInts.

So when Scala sees 1 to 4 it first runs the implicit conversion on the 1 converting it from an Int primitive into a RichInt. It can then call the to method on the new RichInt object, passing in the second Int (4) as the parameter.

Hmm, think I understand, how's about another example?

Certainly. Lets try to improve our Complex number class we created in the previous post.

Using operator overloading we were able to support adding two complex numbers together using the + operator. eg.

class Complex(val real : Double, val imag : Double) {
  
  def +(that: Complex) = 
            new Complex(this.real + that.real, this.imag + that.imag)
  
  def -(that: Complex) = 
            new Complex(this.real - that.real, this.imag - that.imag)

  override def toString = real + " + " + imag + "i"
  
}

object Complex {
  def main(args : Array[String]) : Unit = {
       var a = new Complex(4.0,5.0)
       var b = new Complex(2.0,3.0)
       println(a)  // 4.0 + 5.0i
       println(a + b)  // 6.0 + 8.0i
       println(a - b)  // 2.0 + 2.0i
  }
}

But what if we want to support adding a normal number to a complex number, how would we do that? We could certainly overload our "+" method to take a Double argument, ie something like...

    def +(n: Double) = new Complex(this.real + n, this.imag)

Which would allow us to do...

    val sum = myComplexNumber + 8.5

...but it'll break if we try...

    val sum = 8.5 + myComplexNumber

To get around this we could use an Implicit Conversion. Here's how we create one.

  object ComplexImplicits {
    implicit def Double2Complex(value : Double) = 
                                     new Complex(value,0.0) 
 }

Simple! Although you do need to be careful to import the ComplexImplicits methods before they can be used. You need to make sure you add the following to the top of your file (even if your Implicits object is in the same file)...

   import ComplexImplicits._

And that's the problem solved, you can now write val sum = 8.5 + myComplexNumber and it'll do what you expect!

Nice. Is there anything else I can do with them?

One other thing I've found them good for is creating easy ways of instantiating objects. Wouldn't it be nice if there were a simpler way of creating one of our complex numbers other than with new Complex(3.0,5.0). Sure you could get rid of the new by making it a case class, or implementing an apply method. But we can do better, how's about just (3.0,5.0)

Awesome, but I'd need some sort of multi parameter implicit conversion, and I don't really see how that's possible!?

The thing is, ordinarily (3.0,5.0) would create a Tuple. So we can just use that tuple as the parameter for our implicit conversion and convert it into a Complex. how we might go about doing this...

  implicit def Tuple2Complex(value : Tuple2[Double,Double]) = 
                               new Complex(value._1,value._2);

And there we have it, a simple way to instantiate our Complex objects, for reference here's what the entire Complex code looks like now.

import ComplexImplicits._

object ComplexImplicits {
  implicit def Double2Complex(value : Double) = new Complex(value,0.0) 

  implicit def Tuple2Complex(value : Tuple2[Double,Double]) = new Complex(value._1,value._2);

}

class Complex(val real : Double, val imag : Double) {
  
  def +(that: Complex) : Complex = (this.real + that.real, this.imag + that.imag)
  
  def -(that: Complex) : Complex = (this.real - that.real, this.imag + that.imag)
      
  def unary_~ = Math.sqrt(real * real + imag * imag)
         
  override def toString = real + " + " + imag + "i"
  
}

object Complex {
  
  val i = new Complex(0,1);
  
  def main(args : Array[String]) : Unit = {
       var a : Complex = (4.0,5.0)
       var b : Complex = (2.0,3.0)
       println(a)  // 4.0 + 5.0i
       println(a + b)  // 6.0 + 8.0i
       println(a - b)  // 2.0 + 8.0i
       println(~b)  // 3.60555
      
       var c = 4 + b
       println(c)  // 6.0 + 3.0i
       var d = (1.0,1.0) + c  
       println(d)  // 7.0 + 4.0i
       
  }

}

"Operator Overloading" in Scala

So, I've been teaching myself Scala recently, and it's a very interesting language.

One of the nice things I like about it, is it's support for creating DSLs, domain specific languages. A domain specific language - or at least my understanding of it - is a language that is written specifically for one problem domain. One example would be SQL, great for querying relational databases, useless for creating first person shooters.

Of course Scala itself is not a DSL, it's a general purpose language. However it does offer several features that allow you to simulate a DSL, in particular operator overloading, and implicit conversions. In this post I'm going to focus on the first of these...

Operator Overloading

So what's operator overloading?

Well operators are typically things such as +, -, and !. You know those things you use to do arithmetic on numbers, or occasionally for manipulating Strings. Well, operator overloading - just like method overloading - allows you to redefine their behaviour for a particular type, and give them meaning for your own custom classes.

Hang on a minute! I'm sure someone once told me operator overloading was evil?

Indeed, this is quite a controversial topic. It's considered far too open for abuse by some, and was so maligned in C++ that the creators of Java deliberately disallowed it (excepting "+" for String concatenation).

I'm of a slightly different opinion, used responsibly it can be very useful. For example lots of different objects support a concept of addition, so why not just use an addition operator?

Lets say you were developing a complex number class, and you want to support addition. Wouldn't it be nicer to write...

Complex result = complex1 + complex2;

...rather than...

Complex result = complex1.add(complex2);

The first example is much more natural don't you think?

So Scala allows you to overload operators then?

Well, not really. In fact, technically not at all.

So all this is just a tease? This is the most stupid blog post I've ever read. Scala's rubbish. I'm going back to Algol 68.

Wait a second, I've not finished. You see Scala doesn't support operator overloading, because it doesn't have operators!

Scala doesn't have operators? You've gone mad, I write stuff like "sum = 2 + 3" all the time, and what about all those funny list operations? "::", and ":/". They look like operators to me!

Well they're not. The thing is, Scala has a rather relaxed attitude to what you can name a method.

When you write...

sum = 2 + 3,

...you're actually calling a method called + on a RichInt type with a value of 2. You could even rewrite it as...

sum = 2.+(3)

...if you really really wanted to.

Aha, I got it. So how do you go about overloading an operator then?

Simple, it's exactly the same as writing a normal method. Here's an example.

class Complex(val real : Double, val imag : Double) {
  
  def +(that: Complex) = 
            new Complex(this.real + that.real, this.imag + that.imag)
  
  def -(that: Complex) = 
            new Complex(this.real - that.real, this.imag - that.imag)

  override def toString = real + " + " + imag + "i"
  
}

object Complex {
  def main(args : Array[String]) : Unit = {
       var a = new Complex(4.0,5.0)
       var b = new Complex(2.0,3.0)
       println(a)  // 4.0 + 5.0i
       println(a + b)  // 6.0 + 8.0i
       println(a - b)  // 2.0 + 2.0i
  }
}

Ok that's nice, what if I wanted a "not" operator though, ie something like a "!"

That's a unary prefix operator, and yes scala can support these, although in a more limited fashion than an infix operator like "+"

Only four operators can be supported in this fashion, +, -, !, and ~. You simply need to call your methods unary_! or unary_~, etc. Here's how you might add a "~" to calculate the magnitude of a Complex number to our complex number class

class Complex(val real : Double, val imag : Double) {
    // ...
    def unary_~ = Math.sqrt(real * real + imag * imag)
}

object Complex {
  def main(args : Array[String]) : Unit = {
     var b = new Complex(2.0,3.0)
     prinln(~b) //  3.60555
   }
}

So that's all pretty simple, but please use responsibly. Don't create methods called "+" unless your class really does something that could be interpreted as addition. And never ever redefine the binary shift left operator "<<" as some sort of substitute for println. It's not clever and you'll make the Scala gods angry.

Hope you found that useful. Next up I'll cover implicit conversions. Another nice feature of Scala that really allows you to write your code in a more natural way

Tuesday, 25 October 2011

Collection creation and Immutability with Google Guava

So, thought I'd take a look at some of the collection creation patterns Guava offers, and also some of the Immutable collection types it offers.

If you've not seen my previous posts, you may want to start here:

Guava part 1 - MultiMaps
Guava part 2 - BiMaps
Guava part 3 - MultiSets

create methods

All of Guava's collection implementations contain one or more static create methods, these do what you'd expect, and generally offer a slightly more concise way of instantiating the collection classes.

Here are two different ways of creating a ArrayListMultimap

    Multimap<String,String> multimap1 = new ArrayListMultimap<String,String>();
    Multimap<String,String> multimap2 = ArrayListMultimap.create();

Ok, so there's not a huge amount in it - 12 characters in this example - but the way I see it, you're removing some redundancy, do we really need to reproduce the Generic type information twice?

That's nice, it's a shame Sun didn't think to add create methods to their Collection types when the created Java 5!

Again, Guava rides to your rescue here, and provides some utility classes for dealing with the standard collection types. com.google.common.collect.Lists, com.google.common.collect.Sets, and com.google.common.collect.Maps

These each provide several methods of the format newCollectionType(), here's some examples

    List<String> myList1 = new ArrayList<String>(); //old way
    List<String> myList2 = Lists.newArrayList();    //guava way

    Set<String> mySet1 = new HashSet<String>(); //old way
    Set<String> mySet2 = Sets.newHashSet();     //guava way

Since the "new" methods are static, you can cut out even more characters by using a static import ie...

    import static com.google.common.collect.Lists.newArrayList;
    import static com.google.common.collect.Sets.newHashSet;

    //elsewhere in code
    List<String> myList2 = newArrayList();
    Set<String> mySet2 = newHashSet();

Keypress wise you're not really saving a great deal using the guava methods, so I guess this is a matter of taste as much as anything else. Personally I think the Guava way reads much better, although I think I'd go without the static imports.

Good so far, but hardly earth shattering, what next?

Immutable collections

These are essentially collection objects that you can't change once they've been created, and are useful for all sorts of reasons. Guava provides Immutable implementations for most of the regular Collection interfaces: ImmutableList, ImmutableSet, and ImmutableMap , and also immutable implementations of some of the Guava collection interfaces ( ImmutableMultimap etc.)

My main use for them is creating static constants. For example lets say you need to hardcode a Set of Strings for some purpose. One way to do this might be, eg.

    private static final Set<String> farmAnimals =
                new HashSet<String>(Arrays.asList("Cow","Pig","Sheep"));

Doesn't look great does it, and it suffers from one major problem. Any code that can access this Set can also change it, which could lead to all sorts of unexpected problems.

Couldn't we just use Collection.unmodifiableSet(Set s) to solve this?

Well, in this particular example I guess we could write...

    private static final Set<String> farmAnimals =
         Collections.unmodifiableSet(
                new HashSet<String>(Arrays.asList("Cow","Pig","Sheep")));

...but that's starting to look a bit unwieldy, and the unmodifiable methods have one other problem. They only return an unmodifiable view of the collection, if you have a reference to the original collection, you can still alter it!

Whilst this may not be a problem in the last example, I still think a much better way of doing this is to use an ImmutableSet

    private static final Set<String> farmAnimals = 
                       ImmutableSet.of("Cow","Pig","Sheep");

That's much nicer isn't it! And there's several other ways we can create them, here's some examples:

    // use copyOf()...
    public void doStuffWithList(List<Object> unsafeList) {
       List<Object> safeList = ImmutableList.copyOf(unsafeList);
    }

    // use a builder...
    public Map<String,Integer> makeImmutableMap() {
        ImmutableMap.Builder<String,Integer> mapBuilder = 
                         new ImmutableMap.Builder<String,Integer>();
        Entry<String,Integer> entry = null;
        while((entry = getEntry()) != null) {
            mapBuilder.put(entry.getKey(), entry.getValue());
        }
        return builder.build();
    }

So, any other advantages of using Immutable collections?

Well there's several. They can simplify logic considerably, especially in multi-threaded environments. If threads only have read access to an object, them you don't need to worry about complicated thread synchronization logic

They are also slightly more efficient to use once they've been created. If a collection knows beforehand what it needs to store, and there's never going to be any changes, you can make various time and space savings. For example, most implementations of ArrayLists or HashMaps, will leave some unused space for new objects, so they don't have to constantly resize themselves. If you know there's never going to be any new objects, there's no need for this.

Finally you could also use them as hash keys. If the contents of a collection can't change, then neither will it's hashcode!

Any disadvantages?

There is of course one big disadvantage of Immutable objects, which is pretty obvious. You can't change them! If you need to alter the collection, you'll first need to take a copy of it. In certain situations - ie where concurrency is a concern - you may in fact want to take this approach. However this is going to be impractical where collections contain many many objects and you'll probably want a good old fashioned mutable collection (complete with synchronization code if required).

The only other thing to be aware of is, just because your collection is immutable, it doesn't mean the objects contained in them automatically are. If you can get a reference to an object in an immutable collection, then there's nothing to stop you changing any mutable state on that object! As a consequence it's best practice to make sure anything you keep in an immutable collection is immutable itself!

Thursday, 22 September 2011

Multisets

Guava part 1 - MultiMaps
Guava part 2 - BiMaps

Continuing this tour of Guava we get to the Multiset. I probably don't use this as much as Multimaps or Bimaps, but it certainly does have it's uses.

So what's a Multiset then?

Well as you might be able to guess it's a set that can hold multiple instances of the same object.

Isn't that just a List?

In Java there are two basic differences between Lists and Sets. Lists can hold duplicates of the same object, and Lists are always ordered. Sets can't hold duplicates, and there's no guarantee of order by the Set interface. (Some implementations - LinkedHashSet, SortedSet etc. - do of course provide a guaranteed order!)

So a Multiset occupies a sort of grey area between a List and a Set. Duplicates allowed, but no guaranteed order.

This collection is also sometimes called a Bag, in fact this is what Apache Commons Collections calls it's Mutlisets.

So what would I use one for?

The great thing about Multisets is they keep track of the counts of each particular object in the set. So you can use them for counting stuff.

Have you ever written code like the following:

Map<MyClass,Integer> objectCounts = new HashMap<MyClass,Integer>();

public void incrementCount(MyClass obj) {
    Integer count = objectCounts.get(obj);
    if (count == null) {
        objectCounts.put(obj,0);
    } else {
        objectCounts.put(obj,count++);
    }
}

public int getCount(MyClass obj) {
    Integer count = objectCounts.get(obj);
    if (count == null) {
        return 0;
    } else {
        return count;
    }
}

Bit unwieldy? Lets see how we might use a Multiset instead:

Multiset<MyClass> myMultiset = HashMultiset.create();

MyClass myObject = new MyClass();

myMultiset.add(myObject);
myMultiset.add(myObject);  // add it a second time.

System.out.println(myMultiset.count(myObject)); // 2

myMultiset.remove(myObject);
System.out.println(myMultiset.count(myObject)); // 1

As you can see that's much simpler! It's even possible to add/remove more than one object at at time

Multiset<MyClass> myMultiset = HashMultiset.create();

MyClass myObject = new MyClass();
myMultiset.add(myObject,5); // Add 5 copies of myObject

System.out.println(myMultiset.count(myObject)); // 5

myMultiset.remove(myObject,2); // remove 2 copies

System.out.println(myMultiset.count(myObject)); // 3

Pretty useful eh? As usual there's several implementations available depending on your requirements, and I recommend taking a look at the API: http://docs.guava-libraries.googlecode.com/git-history/v9.0/javadoc/com/google/common/collect/Multiset.html