1

I'm writing a script in JS which will be used for generating a Complete the sentence type of quiz.

The _______ brown fox jumps over the lazy _____.

Step 1: The user will be presented with a textbox to put a sentence in.
Step 2: The sentence will be broken up into actual words(no commas, full stops etc) into an array.
Step 3: In the background, I'll loop through each of the words and wrap them in an anchor <a />
Step 4: The user will be able to click on one or many words to mark them for the quiz taker to complete. I will do some validation around this.
Step 5: The end result will be a sentence with blank spaces for the words that were selected with a random number (1-n) of extra characters so there's no exact hints for the words length.

Most of the functionality I'm ok with, but I need to split the sentence into exact words (presumably using regex).

There are a few rules around this, commas and full stops should be ignored as well as any special characters. I'm also thinking of limiting the input so that special characters aren't allowed to make this an easier task. Single quotes and hyphens should be included in the word matching as some words contain these.

There may be other rules I can't think of so very happy for you to leave a comment and suggest them.

I have started with a basic jsFiddle which simply separates by spaces.

Thanks for reading.

Marko
  • 71,361
  • 28
  • 124
  • 158

2 Answers2

1

Split by non-words

A word is a combination of letters, single quotes and hyphens, anything else repeated is a non-word.

To achieve this, change the split statement to the following:

var textArray = text.split(/[^a-zA-Z'-]+/)

Javascript won't split using regex, unless you use / to define the regex.

In order to retain the separators, capture them using match and reinsert them in the order captured as you go.

var splitArray = text.match(/[^a-zA-Z'-]+/)
Community
  • 1
  • 1
krlmlr
  • 25,056
  • 14
  • 120
  • 217
  • This works however I would like to retain the punctuation so that the sentence isn't broken. – Marko Jul 19 '12 at 04:20
  • Then capture the separators using `text.match` and reinsert them when constucting the sentence. – krlmlr Jul 19 '12 at 05:49
  • After adding the `` to the result, add also the corresponding entry of `splitArray` to the result. Note that the size of `splitArray` is one less than that of `textArray`. – krlmlr Jul 22 '12 at 23:25
0

Hmm... I have a pretty simple solution:

[\w'-]+

That's it.

Works fine for this line:

I like 2 have "icecream", dude's and dude-ettes.

Yeah that's a strange sentence above. But it worked as a test case. Try it. It'll include the number 2 there as a word. Not sure if you want that. And just add any other special characters you need in there next to the hyphen.

Jason Antic
  • 116
  • 3