1

I need to find the length of my string "பாரதீய ஜனதா இளைஞர் அணி தலைவர் அனுராக்சிங் தாகூர் எம்.பி. நேற்று தேர்தல் ஆணையர் வி.சம்பத்". I got the string length as 45 but i expect the string length to be 59. Here i need to add the regular expression condition for spaces and dot (.). My code

import java.util.*;
import java.lang.*;
import java.util.regex.*;

class UnicodeLength
{
public static void main (String[] args)
{
    String s="பாரதீய ஜனதா இளைஞர் அணி தலைவர் அனுராக்சிங் தாகூர் எம்பி நேற்று தேர்தல் ஆணையர் விசம்பத்";
    List<String> characters=new ArrayList<String>();
    Pattern pat = Pattern.compile("\\p{L}\\p{M}*");
    Matcher matcher = pat.matcher(s);
    while (matcher.find()) {
        characters.add(matcher.group());            
    }

    // Test if we have the right characters and length
    System.out.println(characters);
    System.out.println("String length: " + characters.size());

}

}

Dhinakar
  • 4,061
  • 6
  • 36
  • 68
  • 2
    Can you explain why you think it should be 59? (I obviously don't know that language.) – Sotirios Delimanolis Jun 06 '14 at 05:12
  • 2
    1 பா 2 ர 3 தீ 4 ய 5 6 ஜ 7 ன 8 தா 9 10 இ 11 ளை 12 ஞ 13 ர் 14 15 அ 16 ணி 17 18 த 19 லை 20 வ 21 ர் 22 23 அ 24 னு 25 ரா 26 க் 27 சி 28 ங் 29 30 தா 31 கூ 32 ர் 33 34 எ 35 ம் 36 . 37 பி 38 . 39 40 நே 41 ற் 42 று 43 44 தே 45 ர் 46 த 47 ல் 48 49 ஆ 50 ணை 51 ய 52 ர் 53 54 வி 55 . 56 ச 57 ம் 58 ப 59 த் – Dhinakar Jun 06 '14 at 05:19
  • You should print out the strings you add to your list. Java seems to process those characters differently than you are suggesting. – Sotirios Delimanolis Jun 06 '14 at 05:31
  • 1
    @Dhinakar In your comment you listed the expected strings, but there are _empty places_. Your numbers 5 and 9 - for example - do not map anything. There is the difference. Why do you think, that pattern matching should find such _empty_ letters? So just a hint: With the pattern `\p{L}?\p{M}*` you will get those _empties_ (but still only 57). – Seelenvirtuose Jun 06 '14 at 06:47

1 Answers1

1

The code below worked for me. There were three issues that I fixed:

  1. I added a check for spaces to your regular expression.
  2. I added a check for punctuation to your regular expression.
  3. I pasted the string from your comment into the string in your code. They weren't the same!

Here's the code:

public static void main(String[] args) {
    String s = "பாரதீய ஜனதா இளைஞர் அணி தலைவர் அனுராக்சிங் தாகூர் எம்.பி. நேற்று தேர்தல் ஆணையர் வி.சம்பத்";
    List<String> characters = new ArrayList<String>();
    Pattern pat = Pattern.compile("\\p{P}|\\p{L}\\p{M}*| ");
    Matcher matcher = pat.matcher(s);
    while (matcher.find()) {
        characters.add(matcher.group());
    }
    // Test if we have the right characters and length
    int i = 1;
    for (String character : characters) {
        System.out.println(String.format("%d = [%s]", i++, character));
    }
    System.out.println("Characters Size: " + characters.size());
}

It's probably worth pointing out that your code is remarkably similar to the solution for this SO. One comment on that solution in particular led me to discover the missing check for punctuation in your code and allowed me to notice that the string from your comment didn't match the string in your code.

Community
  • 1
  • 1
Erik Gillespie
  • 3,929
  • 2
  • 31
  • 48