Tuesday, 31 January 2012

Apache Commons Lang StringUtils

So, thought it'd be good to talk about another Java library that I like. It's been around for a while and is not perhaps the most exciting library, but it is very very useful. I probably make use of it daily.

org.apache.commons.lang.StringUtils

StringUtils is part of Apache Commons Lang (http://commons.apache.org/lang/, and as the name suggest it provides some nice utilities for dealing with Strings, going beyond what is offered in java.lang.String. It consists of over 50 static methods, and I'm not going to cover every single one of them, just a selection of methods that I make the most use of.

There are two different versions available, the newer org.apache.commons.lang3.StringUtils and the older org.apache.commons.lang.StringUtils. There are not really any significant differences between the two. lang3.StringUtils requires Java 5.0 and is probably the version you'll want to use.



public static boolean equals(CharSequence str1, CharSequence str2)

Thought I'd start with one of the most straight forward methods. equals. This does exactly what you'd expect, it takes two Strings and returns true if they are identical, or false if they're not.

But java.lang.String already has a perfectly good equals method? Why on earth would I want to use a third party implementation?

It's a fair question. Let's look at some code, can you see any problems?

public void doStuffWithString(String stringParam) {
    if(stringParam.equals("MyStringValue")) {
        // do stuff
    }
}

That's a NullPointerException waiting to happen!

There are a couple of ways around this:

public void safeDoStuffWithString1(String stringParam) {
    if(stringParam != null && stringParam.equals("MyStringValue")) {
        // do stuff
    }
}

public void safeDoStuffWithString2(String stringParm) {
    if("MyStringValue".equals(stringParam)) {
        // do stuff
    }
}

Personally I'm not a fan of either method. I think null checks pollute code, and to me "MyStringValue".equals(stringParam) just doesn't scan well, it looks wrong.

This is where StringUtils.equals comes in handy, it's null safe. It doesn't matter what you pass it, it won't NullPointer on you! So you could rewrite the simple method as follows:

public void safeDoStuffWithString3(String stringParam) {
    if(StringUtils.equals(stringParam,"MyStringValue)) {
        // do stuff
    }
}

It's personal preference, but I think this reads better than the first two examples. There's nothing wrong with them, but I do think StringUtils.equals() is worth considering.



isEmpty, isNotEmpty, isBlank, isNotBlank

OK, these look pretty self explanatory, I'm guessing they're all null safe?

You're probably spotting a pattern here. isEmpty is indeed a null safe replacement for java.lang.String.isEmpty(), and isNotEmpty is it's inverse. So no more null checks:

  if(myString != null && !myString.isEmpty()) { // urghh
     // Do stuff with myString
  }

  if(StringUtils.isNotEmpty(myString)) { // much nicer
     // Do stuff with myString
  }

So, why Blank and Empty?

There is a difference, isBlank also returns true if the String just contains whitespace, ie...

  String someWhiteSpace = "    \t  \n";
  StringUtils.isEmpty(someWhiteSpace); // false
  StringUtils.isBlank(someWhiteSpace); // true


public static String[] split(String str, String separatorChars)

Right that looks just like String.split(), so this is just a null safe version of the built in Java method?

Well, yes it certainly is null safe. Trying to split a null string results in null, and a null separator splits on whitespace. But there is another reason you should consider using StringUtils.split(...), and that's the fact that java.lang.String.split takes a regular expression as a separator. For example the following may not do what you want:

public void possiblyNotWhatYouWant() {
    String contrivedExampleString = "one.two.three.four";
    String[] result = contrivedExampleString.split(".");
    System.out.println(result.length); // 0
}
But all I have to do is put a couple of backslashes in front of the '.' and it will work fine. It's not really a big deal is it?

Perhaps not, but there's one last advantage to using StringUtils.split, and that's the fact that regular expressions are expensive. In fact when I tested splitting a String on a comma (a fairly common use case in my experience), StingUtils.split runs over four times faster!



public static String join(Iterable iterable, String separator)

Ah, finally something genuinely useful!

Indeed I've never found an elegant way of concatenating strings with a separator, there's always that annoying conditional require to check if want to insert the separator or not. So it's nice there's a utility to this for me. Here's a quick example:

  String[] numbers = {"one", "two", "three"};
  StringUtils.join(numbers,",");  // returns "one,two,three"

There's also various overloaded versions of join that take Arrays, and Iterators.

Ok, I'm convinced. This looks like a pretty useful library, what else can it do?

Quite a lot, but like I said earlier I won't bother going through every single method available, I'd just end up repeating what's said in the API documentation. I'd really recommend taking a closer look: http://commons.apache.org/lang/api-3.1/org/apache/commons/lang3/StringUtils.html

So basically if you ever need to do something with a String that isn't covered by Java's core String library (and maybe even stuff that is), take a look at StringUtils.

3 comments:

Tomáš Záluský said...

I personally prefer Guava over commons-lang. Splitting and joining are IMHO more powerful and helper methods like emptyToNull etc. are more universal.
However commons-lang is awesome for specific use cases like Levenshtein distance or very powerful ToStringBuilder + ToStringStyle API.

Eric Jablow said...

Java 7 has an Objects class with a method with signature

public static boolean equals(Object a, Object b)

This makes StringUtils.equals(String a, String b) unnecessary. However, programmers should ask why they are allowing nulls in the first place. Sometimes it can't be helped, but sometimes it's just laziness.

Archimedes Trajano said...

I think I'll stick with commons-lang3 primarily for the ToStringBuilder API which provide some useful shortcuts for debugging.