1

I am trying to Titlecase some text which contains corporate names and their stock symbols.

Example (these strings are concatenated as corporate name, which gets title cased and the symbol in parens): AT&T (T)
John Deere Inc. (DE)

These corporate names come from our database which draws them from a stock pricing service. I have it working EXCEPT for when the name is an abbreviation like AT&T

That is return, and you guessed it right, like At&t. How can I preserve casing in abbreviations. I thought to use indexof to get the position of any &'s and uppercase the two characters on either side of it but that seems hackish.

Along the lines of(pseudo code)

var indexPos = myString.indexOf("&");
var fixedString = myString.charAt(indexPos - 1).toUpperCase().charAt(indexPos + 1).toUpperCase()

Oops, forgot to include my titlecase function

function toTitleCase(str) {
    return str.replace(/([^\W_]+[^\s-]*) */g, function (txt) {
        return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();
    });
}

Any better suggestions?

dinotom
  • 4,990
  • 16
  • 71
  • 139
  • 2
    It's be better to get it right in the database to start with. – Lee Taylor May 13 '14 at 00:15
  • Agreed but it comes into our database nightly and I cant control whats given by the service. Even so, then I would need to title case it properly there, leading to the same problem, just a different language – dinotom May 13 '14 at 00:18
  • What would be the expected result of e.g. _"Jack&Jill"_? – Paul S. May 13 '14 at 00:19
  • `Syntax error at line 1: expected identifier, got '&'` – Bergi May 13 '14 at 00:20
  • That's my point, the hack is a hack for one example which then leads to multiple lines of bs code. The question is there a simple way to handle that – dinotom May 13 '14 at 00:21
  • @Bergi, thats not real code its just an expression of a hack, i wouldn't use & in a variable in code. I made it clearer for you, see edit – dinotom May 13 '14 at 00:21
  • Unless you can define an exact standard or rule set for what makes an abbreviation an abbreviation (which I really don't think you can), I see no way of doing this. – Drazen Bjelovuk May 13 '14 at 00:23
  • Then please don't. If you're writing pseudo code, it should look like pseudo code (eg. omit the `var`s). Btw, `charAt` returns a single character (string of length 1), so applying `.charAt(&Index+1)` to the result makes absolutely no sense. – Bergi May 13 '14 at 00:24
  • How do you **define "abbreviation"**? Should the `DE` be titlecased as well? Why not? – Bergi May 13 '14 at 00:27
  • Problem is that there is no way to write code the correctly detects if something is an abbreviation. ps Why not use correct casing of questions you ask on SO? "Title casing ...?" instead of "title casing ...?" :) – JK. May 13 '14 at 00:32
  • @Bergi The name gets title cased, the symbol in parentheses is added to the name string to get the full result. Stock symbols are always in capitals. – dinotom May 13 '14 at 00:33
  • @dinotom: Still, what makes `AT&T` an abbreviation while `John` is not? Is it the `&` sign? What about abbreviations like `DOM` or `W3C`, `ECMA` or `RegExp`? – Bergi May 13 '14 at 00:35
  • This is a specific use case for converting uneven text that come from a stock service whereby some of the corporate names come title cased, some come all lower case and some come all uppercase. I have no control over how they distribute the data and if I had to fix it on the database input side, I could but I would have the same problem in a different language. Those examples you cited DOM, ECMA etc aren't relevant to this use case problem. It may not be a solvable problem without a lot of extra coding, thats what I am asking. IS THERE A SIMPLE WAY TO DO THIS? the answer may be No. – dinotom May 13 '14 at 00:39

4 Answers4

4

A better title case function may be

function toTitleCase(str) {
    return str.replace(
        /(\b.)|(.)/g,
        function ($0, $1, $2) {
            return ($1 && $1.toUpperCase()) || $2.toLowerCase();
        }
    );
}
toTitleCase("foo bAR&bAz a.e.i."); // "Foo Bar&Baz A.E.I."

This will still transform AT&T to At&T, but there's no information in the way it's written to know what to do, so finally

// specific fixes
     if (str === "At&T"  ) str = "AT&T";
else if (str === "Iphone") str = "iPhone";
// etc 
// or
var dict = {
    "At&T": "AT&T",
    "Iphone": "iPhone"
};
str = dict[str] || str;

Though of course if you can do it right when you enter the data in the first place it will save you a lot of trouble

Paul S.
  • 64,864
  • 9
  • 122
  • 138
  • I was trying to NOT have to do the second part as there are over 4000 symbols and corporate names and are probably 200 at least maybe more that are like the AT&T example. That would be a long select case function, but may be how it has to get done. See my post for why its not done on db side. – dinotom May 13 '14 at 00:44
  • @dinotom I've edited in another way to do it which trades cycles for memory, _key-property_ lookup in _JavaScript_ is very fast – Paul S. May 13 '14 at 01:41
1

This is a general solution for title case, without taking your extra requirements of "abbreviations" into account:

  var fixedString = String(myString).toLowerCase().replace(/\b\w/g, String.toUpperCase);

Although I agree with other posters that it's better to start with the data in the correct format in the first place. Not all proper names conform to title case, with just a couple examples being "Werner von Braun" and "Ronald McDonald." There's really no algorithm you can program into a computer to handle the often arbitrary capitalization of proper names, just like you can't really program a computer to spell check proper names.

However, you can certainly program in some exception cases, although I'm still not sure that simply assuming that any word with an ampersand in it should be in all caps always appropriate either. But that can be accomplished like so:

var titleCase = String(myString).toLowerCase().replace(/\b\w/g, String.toUpperCase);
var fixedString = titleCase.replace(/\b\w*\&\w*\b/g, String.toUpperCase);

Note that your second example of "John Deere Inc. (DE)" still isn't handled properly, though. I suppose you could add some other logic to say, put anything word between parentheses in all caps, like so:

var titleCase = String(myString).toLowerCase().replace(/\b\w/g, String.toUpperCase);
var titleCaseCapAmps = titleCase.replace(/\b\w*\&\w*\b/g, String.toUpperCase);
var fixedString = titleCaseCapAmps.replace(/\(.*\)/g, String.toUpperCase);

Which will at least handle your two examples correctly.

Dan Korn
  • 1,274
  • 9
  • 14
  • Ty, as the post says, the AT&T (T) is concatenated, the name gets title cased first then the symbol in parens appended. – dinotom May 13 '14 at 00:48
  • Ah, I see. I didn't read the your statement "concatenated as corporate name, which gets title cased and the symbol in parens" in the initial post to mean that. You can just do the first two replacements then. – Dan Korn May 13 '14 at 00:56
  • Also, to nitpick, technically "AT&T" is an acronym, not an abbreviation, while "Inc." actually is an abbreviation. Also, I'm compelled to reiterate what other posters have said, there are plenty of acronyms in company names which are likely to appear in company names in stock quotes. It doesn't seem farfetched at all that you might encounter a company name like "ABN AMRO" or "AOL Time Warner." There's no way to program an algorithm for those that I know of. – Dan Korn May 13 '14 at 01:02
  • Agreed, which is why I was asking if there is a simple way to do that; apparently there isn't – dinotom May 14 '14 at 09:46
  • Right, there's no algorithm to examine any arbitrary word and know whether it's an acroynm which should be in all caps, or a proper name which should be in title case, or a "regular" word which would usually be in lowercase. – Dan Korn May 15 '14 at 19:16
1

How about this: Since the number of registered companies with the stock exchange is finite, and there's a well-defined mapping between stock symbols and company names, your best best is probably to program that mapping into your code, to look up the company name by the ticker abbreviation, something like this:

var TickerToName = 
{
    A: "Agilent Technologies",
    AA: "Alcoa Inc.",
    // etc., etc.
}

Then it's just a simple lookup to get the company name from the ticker symbol:

var symbol = "T";
var CompanyName = TickerToName[symbol] || "Unknown ticker symbol: " + symbol;

Of course, I would be very surprised if there was not already some kind of Web Service you could call to get back a company name from a stock ticker symbol, something like in this thread: Stock ticker symbol lookup API

Or maybe there's some functionality like this in the stock pricing service you're using to get the data in the first place.

Community
  • 1
  • 1
Dan Korn
  • 1,274
  • 9
  • 14
0

The last time I faced this situation, I decided that it was less trouble to simply include the few exceptions here and there as need.

var titleCaseFix = {
  "At&t": "AT&T"
}


var fixit(str) {
  foreach (var oldCase in titleCaseFix) {
    var newCase = titleCaseFix[oldCase];

    // Look here for various string replace options:
    // http://stackoverflow.com/questions/542232/in-javascript-how-can-i-perform-a-global-replace-on-string-with-a-variable-insi    


  }
  return str;
}
Jeremy J Starcher
  • 23,369
  • 6
  • 54
  • 74