2

I am trying to create a map by taking the first character of each word and it's position in a sentence/paragraph. I am using regex pattern to achieve this. Regex is a costly operation. Are there are any ways to achieve this?

Regex way:

public static void getFirstChar(String paragraph) {
    Pattern pattern = Pattern.compile("(?<=\\b)[a-zA-Z]");
    Map newMap = new HashMap();

    Matcher fit = pattern.matcher(paragraph);
    while (fit.find()) {
        newMap.put((fit.group().toString().charAt(0)), fit.start());
    }
}
Gumbo
  • 643,351
  • 109
  • 780
  • 844
Radhika
  • 423
  • 7
  • 22
  • You wouldn’t need the look-behind assertion as `\b` already is an assertion that doesn’t consume any character. – Gumbo May 11 '10 at 07:31
  • 1
    Don't use raw types in new code! http://stackoverflow.com/questions/2770321/what-is-a-raw-type-and-why-shouldnt-we-use-it/ – polygenelubricants May 11 '10 at 07:41
  • 2
    "Regex is a costly operation" - did you profile, or is that your assumption? This particular regex shouldn't be too costly and is asymptotically the same as doing your own linear scan. – polygenelubricants May 11 '10 at 07:44
  • Don't use raw types in new code! - This was just a sample code that I wrote to post it in the forum. Sure will take care. did you profile, or is that your assumption - The response time from the method using this operation was long. Hence tried for a better option. – Radhika May 12 '10 at 06:38

2 Answers2

0

Python:

wmap = {}
prev = 0
for word in "the quick brown fox jumps over the lazy dog".split():
    wmap[word[0]] = prev
    prev += len(word) + 1

print wmap

If a letter appears more than once as the first letter of a word it'll map to the last position. For a list of all positions change wmap[word[0]] = prev to:

if word[0] in wmap:
    wmap[word[0]].append(prev)
else:
    wmap[word[0]] = [prev]
bboe
  • 4,092
  • 3
  • 29
  • 39
  • It should be pretty easy to translate this code to java. I'm pretty sure split is the same, and your use of map is the same. Can't make it too easy for ya :) – bboe May 11 '10 at 07:41
  • This `split` works for spaces. The original code finds each letter following a non-letter, which might be an apostrophe (it finds the `m` in `I'm`). – Christian Semrau May 11 '10 at 21:14
0

You can do your own linear scan if you really need to squeeze every bit of performance:

                 //0123456789012345678901
    String text = "Hello,my name is=Helen";
    Map<Character,Integer> map = new HashMap<Character,Integer>();

    boolean lastIsLetter = false;
    for (int i = 0; i < text.length(); i++) {
        char ch = text.charAt(i);
        boolean currIsLetter = Character.isLetter(ch);
        if (!lastIsLetter && currIsLetter) {
            map.put(ch, i);
        }
        lastIsLetter = currIsLetter;
    }

    System.out.println(map);
    // prints "{n=9, m=6, H=17, i=14}"

API links

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • @Radhika: consider accepting the answer (clicking on the green check on the left) if an answer is satisfactory. If you still need help with the question, explain your concerns and I'll try to address it. – polygenelubricants May 12 '10 at 07:31