Thursday 22 September 2011

Multisets

Guava part 1 - MultiMaps
Guava part 2 - BiMaps

Continuing this tour of Guava we get to the Multiset. I probably don't use this as much as Multimaps or Bimaps, but it certainly does have it's uses.

So what's a Multiset then?

Well as you might be able to guess it's a set that can hold multiple instances of the same object.

Isn't that just a List?

In Java there are two basic differences between Lists and Sets. Lists can hold duplicates of the same object, and Lists are always ordered. Sets can't hold duplicates, and there's no guarantee of order by the Set interface. (Some implementations - LinkedHashSet, SortedSet etc. - do of course provide a guaranteed order!)

So a Multiset occupies a sort of grey area between a List and a Set. Duplicates allowed, but no guaranteed order.

This collection is also sometimes called a Bag, in fact this is what Apache Commons Collections calls it's Mutlisets.

So what would I use one for?

The great thing about Multisets is they keep track of the counts of each particular object in the set. So you can use them for counting stuff.

Have you ever written code like the following:
Map<MyClass,Integer> objectCounts = new HashMap<MyClass,Integer>();

public void incrementCount(MyClass obj) {
    Integer count = objectCounts.get(obj);
    if (count == null) {
        objectCounts.put(obj,0);
    } else {
        objectCounts.put(obj,count++);
    }
}

public int getCount(MyClass obj) {
    Integer count = objectCounts.get(obj);
    if (count == null) {
        return 0;
    } else {
        return count;
    }
}
Bit unwieldy? Lets see how we might use a Multiset instead:
Multiset<MyClass> myMultiset = HashMultiset.create();

MyClass myObject = new MyClass();

myMultiset.add(myObject);
myMultiset.add(myObject);  // add it a second time.

System.out.println(myMultiset.count(myObject)); // 2

myMultiset.remove(myObject);
System.out.println(myMultiset.count(myObject)); // 1

As you can see that's much simpler! It's even possible to add/remove more than one object at at time
Multiset<MyClass> myMultiset = HashMultiset.create();

MyClass myObject = new MyClass();
myMultiset.add(myObject,5); // Add 5 copies of myObject

System.out.println(myMultiset.count(myObject)); // 5

myMultiset.remove(myObject,2); // remove 2 copies

System.out.println(myMultiset.count(myObject)); // 3
Pretty useful eh? As usual there's several implementations available depending on your requirements, and I recommend taking a look at the API: http://docs.guava-libraries.googlecode.com/git-history/v9.0/javadoc/com/google/common/collect/Multiset.html

Thursday 8 September 2011

BiMaps

Guava part 1 - MultiMaps

Next up on my tour of Guava, is the BiMap, another useful collection type. It's pretty simple really, a BiMap is simply a two way map.

Inverting a Map

A normal java map is a set of keys and values, and you can look up values by key, very useful, eg lets say I wanted to create a (very rudimentary) British English to American English dictionary:
Map<String,String> britishToAmerican = Maps.newHashMap();
britishToAmerican.put("aubergine","egglant");
britishToAmerican.put("courgette","zucchini");
britishToAmerican.put("jam","jelly");
But what if you want an American to British dictionary? Well you could write some code to invert the map:
    // Generic method to reverse map.
    public %lt;S,T> Map<T,S> getInverseMap(Map<S,T> map) {
  Map<T,S> inverseMap = new HashMap<T,S>();
  for(Entry<S,T> entry: map.entrySet()) {
   inverseMap.put(entry.getValue(), entry.getKey());
  }
  return inverseMap;
 }
It'll do the job, but there's several complications you might need to think about.
  • How do we handle duplicate values in the original map? At the moment they'll be silently overwritten in the reverse map.
  • What if we want to put a new entry in the reversed map? We'd also have to update the original map! This could get annoying.

BiMaps

Well, guess what? This is the sort of situation a BiMap is designed for! And here's how you might use it.

BiMap<String,String> britishToAmerican = HashBiMap.create();

// Initialise and use just like a normal map
britishToAmerican.put("aubergine","egglant");
britishToAmerican.put("courgette","zucchini");
britishToAmerican.put("jam","jelly");

System.out.println(britishToAmerican.get("aubergine")); // eggplant

BiMap<String,String> americanToBritish = britishToAmerican.inverse();

System.out.println(americanToBritish.get("eggplant")); // aubergine
System.out.println(americanToBritish.get("zucchini")); // courgette
Pretty simple really, but there's a few things to notice.

Enforcing uniqueness

Firstly the BiMap enforces uniqueness of it's values, and will give you an illegal argument exception if you try to insert a duplicate value, ie
britishToAmerican.put("pudding","dessert");
britishToAmerican.put("sweet","dessert"); // IllegalArgumentException.
If you need to add a values that has already been added there's a forcePut method that will overwrite the entry with the duplicate value.
britishToAmerican.put("pudding","dessert");
britishToAmerican.forcePut("sweet","dessert");  // Overwrites the previous entry
System.out.println(britishToAmerican.get("sweet")); // dessert
System.out.println(britishToAmerican.get("pudding")); // null

The inverse method

The other crucial thing to understand is the inverse method, this returns the inverse BiMap, ie the a map with the keys and values switched round.

Now this inverse map, isn't just a new map, such as my earlier reverseMap method might have created. It's actually a view of the of the original map. This means that any subsequent changes to the inverse method will affect the original map!
americanToBritish.put("potato chips","crisps");
System.out.println(britishToAmerican.containsKey("crisps")); // true
System.out.println(britishToAmerican.get("crisps")); // potato chips

So that's the BiMap, like I said pretty simple. As usual there are several implementations available, and as ever I recommend taking a look at the full API documentation: http://guava-libraries.googlecode.com/svn/tags/release09/javadoc/com/google/common/collect/BiMap.html

Next up, Multisets!

Guava part 3 - Multisets

Thursday 1 September 2011

Multimaps - Google Guava

Guava?

This is the first in a series of posts where I'll be attempting to explain and explore Google's awesome Guava java library.

I first came across Guava whilst searching for generic versions of Apache Commons Collections - I needed a Bimap and was fed up with having to pepper my code with casts - however what I found was much much better.

Not only does it contain various implementations of more complex (but useful) collection types - Multimaps, Multisets, Bimaps - which I'll discuss in detail, but also facilities to support a more functional style of programming with immutable collections, and function and predicate objects. This has both completely changed the way I write java, and at the same time made me increasingly frustrated with Java's sometimes clunky syntax, something I intend to explore in further posts.

Anyway enough with the introduction, and on with the good stuff. The first thing I'd like to take a look at is the Multimap, which is probably the single Guava feature I've made the most use of.

Mutlimaps

So, how often have you needed a data structure like the following?

Map<String,List<MyClass>> myClassListMap test2
                              = new HashMap<String,List<MyClass>>()


If you're anything like me, fairly frequently. And don't you find yourself writing the same boilerplate code over and over again?

To put a key/value pair into this map, you need to first check if a list already exists for your key, and if it doesn't create it. You'll end up writing something along the lines of the following:

void putMyObject(String key, Object value) {
    List<Object> myClassList = myClassListMap.get(key);
    if(myClassList == null) {
        myClassList = new ArrayList<object>();
        myClassListMap.put(key,myClassList);
    }
    myClassList.add(value);
}


Bit of a pain, and what if you need methods to check a value exists, or remove a value, or even iterate over the entire data structure. That can be quite a lot of code.

Never fear Guava is here!

Just like the standard java collections, Guava defines several interfaces and matching implementations. Usually you want to code to an interface, and only worry about the implementation when you create it. In this case we're interested in Multimaps.

So using a multimap, we could replace the data structure declaration with the following:

Multimap<String,Object> myMultimap = ArrayListMultimap.create();


There's a few things to note here. The generic type declaration should look very familiar, this is exactly how you would declare a normal Map.

You may have been expecting to see new ArrayListMultimap<String,Object>() on the right-hand side of the equals. Well, all Guava collection implementations offer a create method, which is usually more concise and has the advantage that you do not have to duplicate the generic type information.

Guava in fact adds similar functionality to the standard Java collections. For example, if you examine com.google.common.collect.Lists, you'll see static newArrayList(), and newLinkedList() methods, so you can take advantage of this conciseness even with the standard Java collections. (I'll aim to cover this in more detail in a future post).

So we've declared and instantiated a multimap, how do we go about using them? Easy just like a normal map!

public class MutliMapTest {
    public static void main(String... args) {
  Multimap<String, String> myMultimap = ArrayListMultimap.create();
  
  // Adding some key/value
  myMultimap.put("Fruits", "Bannana");
  myMultimap.put("Fruits", "Apple");
  myMultimap.put("Fruits", "Pear");
  myMultimap.put("Vegetables", "Carrot");
  
  // Getting the size
  int size = myMultimap.size();
  System.out.println(size);  // 4
  
  // Getting values
  Collection<string> fruits = myMultimap.get("Fruits");
  System.out.println(fruits); // [Bannana, Apple, Pear]
  
  Collection<string> vegetables = myMultimap.get("Vegetables");
  System.out.println(vegetables); // [Carrot]
  
  // Iterating over entire Mutlimap
  for(String value : myMultimap.values()) {
   System.out.println(value);
  }
  
  // Removing a single value
  myMultimap.remove("Fruits","Pear");
  System.out.println(myMultimap.get("Fruits")); // [Bannana, Pear]
  
  // Remove all values for a key
  myMultimap.removeAll("Fruits");
  System.out.println(myMultimap.get("Fruits")); // [] (Empty Collection!)
 }
}


One thing you may be wondering, is why does the get method return a Collection and not a List, that would be much more useful. Indeed it would. The problem is there are several different implementations available, some use Lists - ArrayListMultimap, LinkedListMultimap etc. - and some use Sets - HashMultimap, TreeMultimap among others.

To handle this - if you need to work directly with the Lists, or Sets in the map - there are several subinterfaces defined. ListMultimap, SetMultimap, and SortedSetMultimap. These all do what you'd expect, and their methods that return collections, will return one of the approprite type.

ie

ListMutlimap<String,String> myMutlimap = ArrayListMultimap.create();

List<string> myValues = myMutlimap.get("myKey");  // Returns a List, not a Collection.



That's basically all there is to them. I recommend looking at the API: http://docs.guava-libraries.googlecode.com/git-history/release09/javadoc/com/google/common/collect/Multimap.html, where you can find the various implementations, you should be able to find one that suits your needs.

So, that's all for now. In my next post, I'll be introducing the BiMap

Guava part 1 - MultiMaps
Guava part 2 - BiMaps
Guava part 3 - Multisets