1

I'd like to put all of the individual Japanese characters into an array. For example entering 攻壳机动队 into a textarea (html) and creating an array with each character ['攻','壳','机','动','队'] in javascript. Duplicates should be kept.

I'd like to split by punctuation and spaces but, with Japanese, the sentences don't have spaces so I'm not sure how I can take each individual character and put them into an array. (I know some words consist of multiple characters but I am currently interested looking at how to separate each character to put in an array, multi-character words would be the next step).

irregular
  • 1,437
  • 3
  • 20
  • 39
  • 1
    possible duplicate of [How do you get a string to a character array in JavaScript?](http://stackoverflow.com/questions/4547609/how-do-you-get-a-string-to-a-character-array-in-javascript) – Sam Hanley Dec 15 '14 at 20:26
  • That will probably work, will try it when I get the chance thank you – irregular Dec 15 '14 at 20:28
  • Trick: use `.split('')`. Mmmmm. I like Ghost in the Shell, too! – Terry Dec 15 '14 at 20:30
  • Do you want it to work with rare characters that are not in the BMP and therefore consist of two half-characters, a "surrogate pair"? – hippietrail Feb 13 '15 at 18:22

1 Answers1

2

Just using myString.split("") will split each character.

As for the second part, I think you'll find that to be very difficult. It's the same difficulty as coding for the english case of splitting the string thisismyexamplestring into coherant words. The computer won't know off hand, and you can't really add in rules stating where a general split in the string to occur, to account for multiple character words.

If, for example, you had a textarea that asked for a user to talk about their computer, then the character '电' would most likely be followed by the character '脑', and you could probably apply some logic to combine those characters into one array index, but that might not always be the case.

I used chinese in my example, but the principle is the same (Don't know japanese, sorry).

user3334871
  • 1,251
  • 2
  • 14
  • 36