2

I need a way to fix capitalization in abbreviations found within a String. Assume all abbreviations are correctly spaced.

For example,

"Robert a.k.a. Bob A.k.A. dr. Bobby"

becomes:

"Robert A.K.A. Bob A.K.A. Dr. Bobby"

Correctly capitalized abbreviations will be known ahead of time, stored in a Collection of some sort.

I was thinking of an algorithm like this:

private String fix(String s) {
    StringBuilder builder = new StringBuilder();
    for (String word : s.split(" ")) {
        if (collection.contains(word.toUpperCase()) {
            // word = correct abbreviation here
        }
        builder.append(word);
        builder.append(" ");
    }
    return builder.toString().trim();
}

But as far as I know, there are a couple of problems with this approach:

  • If the abbreviation has a lower case letter (Dr.)
  • If the word starts or ends with punctuation ("a.k.a.")

I have a feeling that this can be solved with a regex, iteratively matching and replacing the correct abbreviation. But if not, how should I approach this problem?

budi
  • 6,351
  • 10
  • 55
  • 80
  • Do you want to capitalize the first letter of every word? And a word being defined by a string sequence followed by a period or a blank space. Is this correct? – blurfus Oct 08 '15 at 00:21
  • Possible duplicate of [Regex capitalize first letter every word, also after a special character like a dash](http://stackoverflow.com/questions/6251463/regex-capitalize-first-letter-every-word-also-after-a-special-character-like-a) – blurfus Oct 08 '15 at 00:25
  • I currently have a list of all the abbreviations I want to correct. So if there is some non-sensical abbreviation like `"o.o."`, it will not be corrected. – budi Oct 08 '15 at 00:38
  • Oh, I see - let me see if my answer covers this non-sensical abbreviation – blurfus Oct 08 '15 at 00:39
  • How could you really _know_ if an abbreviation is "non-sensical"? White-listing all possible abbreviations feels like over-complicating things.. – Mick Mnemonic Oct 08 '15 at 00:46
  • This post may be of some value to you with regard to processing Strings into title case: http://stackoverflow.com/questions/1086123/string-conversion-to-title-case/15738441#15738441. – scottb Oct 08 '15 at 01:03

3 Answers3

2

Instead of using a regex or rolling your own implementation, I would suggest you use an utility library. WordUtils in Apache Commons Lang is perfect for the job:

String input = "Robert a.k.a. Bob A.k.A. dr. Bobby";
String capitalized = WordUtils.capitalize(input, '.', ' ');
System.out.println(capitalized);

This prints out

Robert A.K.A. Bob A.K.A. Dr. Bobby
Mick Mnemonic
  • 7,808
  • 2
  • 26
  • 30
1

You do not have to use regex, ie. your solution looks reasonable (although it may be slow if you have a lot of data to process).

For abbreviations contained lower case letters, eg. Dr. you could use a case insensitive string comparison rather than toUpperCase. Actually, that's only useful if you are directly comparing the strings yourself. You really need a case-insensitive HashMap. Perhaps:

Map<String, String> collection = new TreeMap<String, String>(String.CASE_INSENSITIVE_ORDER);

If the abbreviation starts or ends with punctuation, then make sure the corresponding key in your collection does too.

dave
  • 11,641
  • 5
  • 47
  • 65
1

This is how I went about it...

UPDATED

after reading comments by OP

it prints:

Robert A.K.A. Bob A.K.A. Dr. Bobby The o.o.

import java.util.ArrayList;
import java.util.List;

public class Fixer {

    List<String> collection = new ArrayList<>();

    public Fixer() {
        collection.add("Dr.");
        collection.add("A.K.A.");
        collection.add("o.o.");
    }

    /* app entry point */
    public static void main(String[] args) throws InterruptedException {
        String testCase = "robert a.k.a. bob A.k.A. dr. bobby the o.o.";

        Fixer l = new Fixer();
        String result = l.fix(testCase);

        System.out.println(result);
    }

    private String fix(String s) {
        StringBuilder builder = new StringBuilder();
        for (String word : s.split(" ")) {
            String abbr = getAbbr(word);
            if (abbr == null) {
                builder.append(word.substring(0, 1).toUpperCase());
                builder.append(word.substring(1));
            } else {
                builder.append(abbr);
            }
            builder.append(" ");
        }
        return builder.toString().trim();
    }

    private String getAbbr(String word) {
        for (String abbr : collection) {
            if (abbr.equalsIgnoreCase(word)) {
                return abbr;
            }
        }
        return null;
    }
}
blurfus
  • 13,485
  • 8
  • 55
  • 61