Monday, 27 February 2012

Guava Splitter vs StringUtils

So I recently wrote a post about good old reliable Apache Commons StringUtils, which provoked a couple of comments, one of which was that Google Guava provides better mechanisms for joining and splitting Strings. I have to admit, this is a corner of Guava I've yet to explore. So thought I ought to take a closer look, and compare with StringUtils, and I have to admit I was surprised at what I found.

Splitting strings eh? There can't be many different ways of doing this surely?

Well Guava and StringUtils do take a sylisticly different approach. Lets start with the basic usage.

// Apache StringUtils...
String[] tokens1 = StringUtils.split("one,two,three",',');

// Guava splitter...
Iterable<String> tokens2 = Splitter.on(',').split("one,two,three");

So, my first observation is that Splitter is more object orientated. You have to create a splitter object, which you then use to do the splitting. Whereas the StringUtils splitter methods uses a more functional style, with static methods.

Here I much prefer Splitter. Need a reusable splitter that splits comma separated lists? A splitter that also trims leading and trailing white space, and ignores empty elements? Not a problem:

Splitter niceCommaSplitter = Splitter.on(',')
                              .omitEmptyString()
                              .trimResults();

niceCommaSplitter.split("one,, two,  three"); //"one","two","three"
niceCommaSplitter.split("  four  ,  five  "); //"four","five"
That looks really useful, any other differences?

The other thing to notice is that Splitter returns an Iterable<String>, whereas StringUtils.split returns a String array.

Don't really see that making much of a difference, most of the time I just want to loop through the tokens in order anyway!

I also didn't think it was a big deal, until I examined the performance of the two approaches. To do this I tried running the following code:

final String numberList = "One,Two,Three,Four,Five,Six,Seven,Eight,Nine,Ten";

long start = System.currentTimeMillis();  
for(int i=0; i<1000000; i++) {
    StringUtils.split(numberList , ',');   
}
System.out.println(System.currentTimeMillis() - start);
  
start = System.currentTimeMillis();
for(int i=0; i<1000000; i++) {
    Splitter.on(',').split(numberList );
}
System.out.println(System.currentTimeMillis() - start);

On my machine this output the following times:

594
31


Guava's Splitter is almost 10 times faster!

Now this is a much bigger difference than I was expecting, Splitter is over 10 times faster than StringUtils. How can this be? Well, I suspect it's something to do with the return type. Splitter returns an Iterable<String>, whereas StringUtils.split gives you an array of Strings! So Splitter doesn't actually need to create new String objects.

It's also worth noting you can cache your Splitter object, which results in an even faster runtime.

Blimey, end of argument? Guava's Splitter wins every time?

Hold on a second. This isn't quite the full story. Notice we're not actually doing anything with the result of the Strings? Like I mentioned, it looks like the Splitter isn't actually creating any new Strings. I suspect it's actually deferring this to the Iterator object it returns.

So can we test this?

Sure thing. Here's some code to repeatedly check the lengths of the generated substrings:

final String numberList = "One,Two,Three,Four,Five,Six,Seven,Eight,Nine,Ten";
long start = System.currentTimeMillis();  
for(int i=0; i<1000000; i++) {
  final String[] numbers = StringUtils.split(numberList, ',');
    for(String number : numbers) {
      number.length();
    }
  }
System.out.println(System.currentTimeMillis() - start);
  
Splitter splitter = Splitter.on(',');
start = System.currentTimeMillis();
for(int i=0; i<1000000; i++) {
  Iterable<String> numbers = splitter.split(numberList);
    for(String number : numbers) {
      number.length();
    }
  }
System.out.println(System.currentTimeMillis() - start);

On my machine this outputs:

609
2048


Guava's Splitter is almost 4 times slower!

Indeed, I was expecting them to be about the same, or maybe Guava slightly faster, so this is another surprising result. Looks like by returning an Iterable, Splitter is trading immediate gains, for longer term pain. There's also a moral here about making sure performance tests are actually testing something useful.

In conclusion I think I'll still use Splitter most of the time. On small lists the difference in performance is going to be negligible, and Splitter just feels much nicer to use. Still I was surprised by the result, and if you're splitting lots of Strings and performance is an issue, it might be worth considering switching back to Commons StringUtils.

9 comments:

Loboplaya said...

Very interesting article. I often hesitate between Google Guava or Apache.

Guava feels more modern though, and Google will probably iron out (most of) the performace issue you found.

No risk that Guava is shut down anyway :-) .

Sean said...

The root cause is array much faster than iterable

maxrohde.com said...

Thank you for the comparison. I wonder if by some trick the performance of Iterable iteration could be matched with iterating over an array in future versions of Java, or if that's impossible?

Stanislav Chetvertkov said...
This comment has been removed by the author.
David E. Jacobs said...


As for me, i don't http://www.promptessay.com/appreciate that .They have to change their opinion about whom to f*ck, cuz that's rea'lly delirious. Even an essay writer proved my words

Алексей Бондарчук said...

Nice effort, essaymarket.co.uk very informative, this will help me to complete my task. Thanks for share it keep it up.

Fred Duffler said...

I write programm. http://www.online-essay-writer.org/ And thanks to your blog I don't have any problem with this!

Алексей Бондарчук said...

great)(www masterpapers com) liked everything very much) keep it up and dont stop)

Nikolo Lloyd said...

Some website page - precisely what I would like. We seemed following a anything for a little bit to analyze on by myself periodical, and yes it goes apart a both equally I had put together no think this issue, nor I won't will perform essentially an extended evaluation. Your internet site features amenable youthful methods with myself, any time I can assert now, Which i value.
write an essay