0

For example, I have something like

The study of standards for what is right and what is wrong is called _.

a. pure science

b. applied science

c. ethics

d. technology

... unknown number of choices ...

ANS: C

and I want to split it into

['The study of standards for what is right and what is wrong is called _____.',
'pure science',
'applied science',
'ethics',
'technology',

... as many array elements as there are choices ... 

'ANS: C']

Is there a single regex I can use that will work with an arbitrary number of choices? If not, how would you go about doing this in either Javascript or PHP?

jack97
  • 79
  • 6
  • Is there guaranteed to be a line-break between each solution? – levi Nov 05 '13 at 01:54
  • yep. javascript split() function with "\na.", "\nb.", "\nc." is what I'm doing right now, but it's a very static solution and I'm only checking up to "\ne.". I was hoping for something more dynamic. – jack97 Nov 05 '13 at 02:32
  • what is original source? txt file? Is there any structure to it? Seems like a bit of a garbage-in-garbage-out problem based on format shown without some layout criteria – charlietfl Nov 05 '13 at 02:57

2 Answers2

1

Is there guaranteed to be a line-break between each solution? Yep...

In you can use one of the following solutions.

Using the split method you can do the following:

results = myString.split(/[\r\n]+/);

Using the match() method you can do the following, this will match the parts that are not linebreaks.

results = myString.match(/[^\r\n]+/g);

In you accomplish your desired task using one of the following solutions.

$wanted = preg_split('~\R+(?!$)~u', $data);
print_r($wanted);

See live working demo

\R matches a generic newline; that is, anything considered a linebreak sequence by Unicode. This includes all characters matched by \v (vertical whitespace) and the multi character sequence \x0D\x0A. To use properly you need to enable the u modifier. The u modifier turns on additional functionality of PCRE and Pattern strings are treated as UTF-8.

I used a negative lookahead after with $ (end of line) so that you are not including empty whitespace.

You can avoid using split and match using negation here.

$wanted = preg_match_all('~[^\r\n]+~', $data, $matches);
print_r($matches);

See live working demo

Output

Array
(
    [0] => The study of standards for what is right and what is wrong is called _.
    [1] => a. pure science
    [2] => b. applied science
    [3] => c. ethics
    [4] => d. technology
    [5] => ... unknown number of choices ...
    [6] => ANS: C
)
hwnd
  • 69,796
  • 4
  • 95
  • 132
0

I would suggest iterating each line, based on the EOL (\r\n or \n). Then you can check each line to see if it a Question or an Answer Item or the Answer. If you want it all in one array, you can use array_push().

Also found this that might be helpful: How to put string in array, split by new line?

Basically just splitting each line into an array as mentioned:

$array = preg_split ('/$\R?^/m', $string);

Community
  • 1
  • 1
Twisty
  • 30,304
  • 2
  • 26
  • 45