Monday 27 February 2012

Guava Splitter vs StringUtils

So I recently wrote a post about good old reliable Apache Commons StringUtils, which provoked a couple of comments, one of which was that Google Guava provides better mechanisms for joining and splitting Strings. I have to admit, this is a corner of Guava I've yet to explore. So thought I ought to take a closer look, and compare with StringUtils, and I have to admit I was surprised at what I found.

Splitting strings eh? There can't be many different ways of doing this surely?

Well Guava and StringUtils do take a sylisticly different approach. Lets start with the basic usage.

// Apache StringUtils...
String[] tokens1 = StringUtils.split("one,two,three",',');

// Guava splitter...
Iterable<String> tokens2 = Splitter.on(',').split("one,two,three");

So, my first observation is that Splitter is more object orientated. You have to create a splitter object, which you then use to do the splitting. Whereas the StringUtils splitter methods uses a more functional style, with static methods.

Here I much prefer Splitter. Need a reusable splitter that splits comma separated lists? A splitter that also trims leading and trailing white space, and ignores empty elements? Not a problem:

Splitter niceCommaSplitter = Splitter.on(',')
                              .omitEmptyString()
                              .trimResults();

niceCommaSplitter.split("one,, two,  three"); //"one","two","three"
niceCommaSplitter.split("  four  ,  five  "); //"four","five"
That looks really useful, any other differences?

The other thing to notice is that Splitter returns an Iterable<String>, whereas StringUtils.split returns a String array.

Don't really see that making much of a difference, most of the time I just want to loop through the tokens in order anyway!

I also didn't think it was a big deal, until I examined the performance of the two approaches. To do this I tried running the following code:

final String numberList = "One,Two,Three,Four,Five,Six,Seven,Eight,Nine,Ten";

long start = System.currentTimeMillis();  
for(int i=0; i<1000000; i++) {
    StringUtils.split(numberList , ',');   
}
System.out.println(System.currentTimeMillis() - start);
  
start = System.currentTimeMillis();
for(int i=0; i<1000000; i++) {
    Splitter.on(',').split(numberList );
}
System.out.println(System.currentTimeMillis() - start);

On my machine this output the following times:

594
31


Guava's Splitter is almost 10 times faster!

Now this is a much bigger difference than I was expecting, Splitter is over 10 times faster than StringUtils. How can this be? Well, I suspect it's something to do with the return type. Splitter returns an Iterable<String>, whereas StringUtils.split gives you an array of Strings! So Splitter doesn't actually need to create new String objects.

It's also worth noting you can cache your Splitter object, which results in an even faster runtime.

Blimey, end of argument? Guava's Splitter wins every time?

Hold on a second. This isn't quite the full story. Notice we're not actually doing anything with the result of the Strings? Like I mentioned, it looks like the Splitter isn't actually creating any new Strings. I suspect it's actually deferring this to the Iterator object it returns.

So can we test this?

Sure thing. Here's some code to repeatedly check the lengths of the generated substrings:

final String numberList = "One,Two,Three,Four,Five,Six,Seven,Eight,Nine,Ten";
long start = System.currentTimeMillis();  
for(int i=0; i<1000000; i++) {
  final String[] numbers = StringUtils.split(numberList, ',');
    for(String number : numbers) {
      number.length();
    }
  }
System.out.println(System.currentTimeMillis() - start);
  
Splitter splitter = Splitter.on(',');
start = System.currentTimeMillis();
for(int i=0; i<1000000; i++) {
  Iterable<String> numbers = splitter.split(numberList);
    for(String number : numbers) {
      number.length();
    }
  }
System.out.println(System.currentTimeMillis() - start);

On my machine this outputs:

609
2048


Guava's Splitter is almost 4 times slower!

Indeed, I was expecting them to be about the same, or maybe Guava slightly faster, so this is another surprising result. Looks like by returning an Iterable, Splitter is trading immediate gains, for longer term pain. There's also a moral here about making sure performance tests are actually testing something useful.

In conclusion I think I'll still use Splitter most of the time. On small lists the difference in performance is going to be negligible, and Splitter just feels much nicer to use. Still I was surprised by the result, and if you're splitting lots of Strings and performance is an issue, it might be worth considering switching back to Commons StringUtils.

Monday 6 February 2012

Useful SVN commands

Usefull SVN commands

Have been increasingly using command line SVN these days. Find it's just a bit quicker and more reliable than the GUI clients I had been using. I've mainly written this for myself, as a quick reference for the commands I use most frequently

svn help COMMAND
  • Displays help for a particular svn command.
  • Lists all available commands if none is specified

svn co URL[@REV]
  • Check out: Creates a local working copy of the repository found at the URL

svn log
  • Provides a log of commit messages
  • Use with -l 10 to limit to 10 most recent messages
  • Use -v for verbose output (lists changed files)
  • Use -r 12345 to get info on particular revision

svn info
  • Provides useful information about the current working copy, such as repository URL, and current revision

svn status
  • Provides a list of diffences between the working copy, and the repostory. Take a look at the help (svn help status to find out what the different column values mean)

svn diff
  • Displays local modifications
  • Use -r N:M to display differences between two revisions
  • Use -c to see the changes for a particular revision

svn revert PATH
  • Reverts the specified path to the contents of the repository. You will lost any local changes!
  • Not recursive by default, use -R to make recursive

svn add PATH
  • Adds the specified path the version control.
  • Note, this doesn't add the file to the repository yet, that doesn't happen till you commit.

svn copy SRC[@REV] DEST
  • Copies something from SRC to DEST.
  • SRC and DEST can both be either working copies paths, or URLS.
  • Usual usage would be WC -> WC, or URL -> URL.
  • URL -> URL is used for branching and tagging

svn delete TARGET
  • Deletes a file.
  • If TARGET is a working copy path, the file is scheduled for deletion on the next commit
  • If TARGET is a repository URL, it immediately deletes the file from the repository.

svn move SRC DST
  • Moves a the specified target.
  • Equivalent of a copy then a delete
  • Maintains history on the moved file.

svn commit -m MESSAGE
  • Commits the working copy changes to the repository

svn list TARGET
  • Provied a directory listing for the specified folder in the repository

svn mkdir TARGET
  • Creates a directory
  • TARGET can be working copy path or repository URL

svn merge -r N:M SOURCE@REV
  • Merges the range of revisions starting at N and ending at M from SOURCE into the current working copy
  • If N > M then it is a reverse merge, and can be used to undo the differences between N and M
  • Typically used to catch up a feature branch
  • Can use -c option to pick a single revision
  • Can supply multiple -c and -r options to cherry pick revisions

svn merge --reintegrate SOURCE@REV
  • Used to reintegrate a branch into it's parent branch
  • Working copy should be the parent branch (often trunk)
  • A branch cannot be reintegrated twice, so good practice to delete the branch afterwads
svn blame TARGET@REV
  • Outputs target, with author names, and revision numbers attached to changes
  • Use to see who broke what.

Tuesday 31 January 2012

Apache Commons Lang StringUtils

So, thought it'd be good to talk about another Java library that I like. It's been around for a while and is not perhaps the most exciting library, but it is very very useful. I probably make use of it daily.

org.apache.commons.lang.StringUtils

StringUtils is part of Apache Commons Lang (http://commons.apache.org/lang/, and as the name suggest it provides some nice utilities for dealing with Strings, going beyond what is offered in java.lang.String. It consists of over 50 static methods, and I'm not going to cover every single one of them, just a selection of methods that I make the most use of.

There are two different versions available, the newer org.apache.commons.lang3.StringUtils and the older org.apache.commons.lang.StringUtils. There are not really any significant differences between the two. lang3.StringUtils requires Java 5.0 and is probably the version you'll want to use.



public static boolean equals(CharSequence str1, CharSequence str2)

Thought I'd start with one of the most straight forward methods. equals. This does exactly what you'd expect, it takes two Strings and returns true if they are identical, or false if they're not.

But java.lang.String already has a perfectly good equals method? Why on earth would I want to use a third party implementation?

It's a fair question. Let's look at some code, can you see any problems?

public void doStuffWithString(String stringParam) {
    if(stringParam.equals("MyStringValue")) {
        // do stuff
    }
}

That's a NullPointerException waiting to happen!

There are a couple of ways around this:

public void safeDoStuffWithString1(String stringParam) {
    if(stringParam != null && stringParam.equals("MyStringValue")) {
        // do stuff
    }
}

public void safeDoStuffWithString2(String stringParm) {
    if("MyStringValue".equals(stringParam)) {
        // do stuff
    }
}

Personally I'm not a fan of either method. I think null checks pollute code, and to me "MyStringValue".equals(stringParam) just doesn't scan well, it looks wrong.

This is where StringUtils.equals comes in handy, it's null safe. It doesn't matter what you pass it, it won't NullPointer on you! So you could rewrite the simple method as follows:

public void safeDoStuffWithString3(String stringParam) {
    if(StringUtils.equals(stringParam,"MyStringValue)) {
        // do stuff
    }
}

It's personal preference, but I think this reads better than the first two examples. There's nothing wrong with them, but I do think StringUtils.equals() is worth considering.



isEmpty, isNotEmpty, isBlank, isNotBlank

OK, these look pretty self explanatory, I'm guessing they're all null safe?

You're probably spotting a pattern here. isEmpty is indeed a null safe replacement for java.lang.String.isEmpty(), and isNotEmpty is it's inverse. So no more null checks:

  if(myString != null && !myString.isEmpty()) { // urghh
     // Do stuff with myString
  }

  if(StringUtils.isNotEmpty(myString)) { // much nicer
     // Do stuff with myString
  }

So, why Blank and Empty?

There is a difference, isBlank also returns true if the String just contains whitespace, ie...

  String someWhiteSpace = "    \t  \n";
  StringUtils.isEmpty(someWhiteSpace); // false
  StringUtils.isBlank(someWhiteSpace); // true


public static String[] split(String str, String separatorChars)

Right that looks just like String.split(), so this is just a null safe version of the built in Java method?

Well, yes it certainly is null safe. Trying to split a null string results in null, and a null separator splits on whitespace. But there is another reason you should consider using StringUtils.split(...), and that's the fact that java.lang.String.split takes a regular expression as a separator. For example the following may not do what you want:

public void possiblyNotWhatYouWant() {
    String contrivedExampleString = "one.two.three.four";
    String[] result = contrivedExampleString.split(".");
    System.out.println(result.length); // 0
}
But all I have to do is put a couple of backslashes in front of the '.' and it will work fine. It's not really a big deal is it?

Perhaps not, but there's one last advantage to using StringUtils.split, and that's the fact that regular expressions are expensive. In fact when I tested splitting a String on a comma (a fairly common use case in my experience), StingUtils.split runs over four times faster!



public static String join(Iterable iterable, String separator)

Ah, finally something genuinely useful!

Indeed I've never found an elegant way of concatenating strings with a separator, there's always that annoying conditional require to check if want to insert the separator or not. So it's nice there's a utility to this for me. Here's a quick example:

  String[] numbers = {"one", "two", "three"};
  StringUtils.join(numbers,",");  // returns "one,two,three"

There's also various overloaded versions of join that take Arrays, and Iterators.

Ok, I'm convinced. This looks like a pretty useful library, what else can it do?

Quite a lot, but like I said earlier I won't bother going through every single method available, I'd just end up repeating what's said in the API documentation. I'd really recommend taking a closer look: http://commons.apache.org/lang/api-3.1/org/apache/commons/lang3/StringUtils.html

So basically if you ever need to do something with a String that isn't covered by Java's core String library (and maybe even stuff that is), take a look at StringUtils.