Monday, 27 February 2012

Guava Splitter vs StringUtils

So I recently wrote a post about good old reliable Apache Commons StringUtils, which provoked a couple of comments, one of which was that Google Guava provides better mechanisms for joining and splitting Strings. I have to admit, this is a corner of Guava I've yet to explore. So thought I ought to take a closer look, and compare with StringUtils, and I have to admit I was surprised at what I found.

Splitting strings eh? There can't be many different ways of doing this surely?

Well Guava and StringUtils do take a sylisticly different approach. Lets start with the basic usage.

// Apache StringUtils...
String[] tokens1 = StringUtils.split("one,two,three",',');

// Guava splitter...
Iterable<String> tokens2 = Splitter.on(',').split("one,two,three");

So, my first observation is that Splitter is more object orientated. You have to create a splitter object, which you then use to do the splitting. Whereas the StringUtils splitter methods uses a more functional style, with static methods.

Here I much prefer Splitter. Need a reusable splitter that splits comma separated lists? A splitter that also trims leading and trailing white space, and ignores empty elements? Not a problem:

Splitter niceCommaSplitter = Splitter.on(',')
                              .omitEmptyString()
                              .trimResults();

niceCommaSplitter.split("one,, two,  three"); //"one","two","three"
niceCommaSplitter.split("  four  ,  five  "); //"four","five"
That looks really useful, any other differences?

The other thing to notice is that Splitter returns an Iterable<String>, whereas StringUtils.split returns a String array.

Don't really see that making much of a difference, most of the time I just want to loop through the tokens in order anyway!

I also didn't think it was a big deal, until I examined the performance of the two approaches. To do this I tried running the following code:

final String numberList = "One,Two,Three,Four,Five,Six,Seven,Eight,Nine,Ten";

long start = System.currentTimeMillis();  
for(int i=0; i<1000000; i++) {
    StringUtils.split(numberList , ',');   
}
System.out.println(System.currentTimeMillis() - start);
  
start = System.currentTimeMillis();
for(int i=0; i<1000000; i++) {
    Splitter.on(',').split(numberList );
}
System.out.println(System.currentTimeMillis() - start);

On my machine this output the following times:

594
31


Guava's Splitter is almost 10 times faster!

Now this is a much bigger difference than I was expecting, Splitter is over 10 times faster than StringUtils. How can this be? Well, I suspect it's something to do with the return type. Splitter returns an Iterable<String>, whereas StringUtils.split gives you an array of Strings! So Splitter doesn't actually need to create new String objects.

It's also worth noting you can cache your Splitter object, which results in an even faster runtime.

Blimey, end of argument? Guava's Splitter wins every time?

Hold on a second. This isn't quite the full story. Notice we're not actually doing anything with the result of the Strings? Like I mentioned, it looks like the Splitter isn't actually creating any new Strings. I suspect it's actually deferring this to the Iterator object it returns.

So can we test this?

Sure thing. Here's some code to repeatedly check the lengths of the generated substrings:

final String numberList = "One,Two,Three,Four,Five,Six,Seven,Eight,Nine,Ten";
long start = System.currentTimeMillis();  
for(int i=0; i<1000000; i++) {
  final String[] numbers = StringUtils.split(numberList, ',');
    for(String number : numbers) {
      number.length();
    }
  }
System.out.println(System.currentTimeMillis() - start);
  
Splitter splitter = Splitter.on(',');
start = System.currentTimeMillis();
for(int i=0; i<1000000; i++) {
  Iterable<String> numbers = splitter.split(numberList);
    for(String number : numbers) {
      number.length();
    }
  }
System.out.println(System.currentTimeMillis() - start);

On my machine this outputs:

609
2048


Guava's Splitter is almost 4 times slower!

Indeed, I was expecting them to be about the same, or maybe Guava slightly faster, so this is another surprising result. Looks like by returning an Iterable, Splitter is trading immediate gains, for longer term pain. There's also a moral here about making sure performance tests are actually testing something useful.

In conclusion I think I'll still use Splitter most of the time. On small lists the difference in performance is going to be negligible, and Splitter just feels much nicer to use. Still I was surprised by the result, and if you're splitting lots of Strings and performance is an issue, it might be worth considering switching back to Commons StringUtils.

Monday, 6 February 2012

Useful SVN commands

Usefull SVN commands

Have been increasingly using command line SVN these days. Find it's just a bit quicker and more reliable than the GUI clients I had been using. I've mainly written this for myself, as a quick reference for the commands I use most frequently

svn help COMMAND
  • Displays help for a particular svn command.
  • Lists all available commands if none is specified

svn co URL[@REV]
  • Check out: Creates a local working copy of the repository found at the URL

svn log
  • Provides a log of commit messages
  • Use with -l 10 to limit to 10 most recent messages
  • Use -v for verbose output (lists changed files)
  • Use -r 12345 to get info on particular revision

svn info
  • Provides useful information about the current working copy, such as repository URL, and current revision

svn status
  • Provides a list of diffences between the working copy, and the repostory. Take a look at the help (svn help status to find out what the different column values mean)

svn diff
  • Displays local modifications
  • Use -r N:M to display differences between two revisions
  • Use -c to see the changes for a particular revision

svn revert PATH
  • Reverts the specified path to the contents of the repository. You will lost any local changes!
  • Not recursive by default, use -R to make recursive

svn add PATH
  • Adds the specified path the version control.
  • Note, this doesn't add the file to the repository yet, that doesn't happen till you commit.

svn copy SRC[@REV] DEST
  • Copies something from SRC to DEST.
  • SRC and DEST can both be either working copies paths, or URLS.
  • Usual usage would be WC -> WC, or URL -> URL.
  • URL -> URL is used for branching and tagging

svn delete TARGET
  • Deletes a file.
  • If TARGET is a working copy path, the file is scheduled for deletion on the next commit
  • If TARGET is a repository URL, it immediately deletes the file from the repository.

svn move SRC DST
  • Moves a the specified target.
  • Equivalent of a copy then a delete
  • Maintains history on the moved file.

svn commit -m MESSAGE
  • Commits the working copy changes to the repository

svn list TARGET
  • Provied a directory listing for the specified folder in the repository

svn mkdir TARGET
  • Creates a directory
  • TARGET can be working copy path or repository URL

svn merge -r N:M SOURCE@REV
  • Merges the range of revisions starting at N and ending at M from SOURCE into the current working copy
  • If N > M then it is a reverse merge, and can be used to undo the differences between N and M
  • Typically used to catch up a feature branch
  • Can use -c option to pick a single revision
  • Can supply multiple -c and -r options to cherry pick revisions

svn merge --reintegrate SOURCE@REV
  • Used to reintegrate a branch into it's parent branch
  • Working copy should be the parent branch (often trunk)
  • A branch cannot be reintegrated twice, so good practice to delete the branch afterwads
svn blame TARGET@REV
  • Outputs target, with author names, and revision numbers attached to changes
  • Use to see who broke what.