1

That's right. Unlike most questions, I am not trying to write a regular expression myself. I am trying to generate a regular expression (JavaScript flavoured, to be used in HTML5's pattern attribute).

Given an array of numbers, give a concise, fast, and correct regular expression that will only match the given input. I have already done part of the job, namely the ones [0-9]:

var ones = [0, 1, 2, 3, 4, 5, 8, 9],
    onesRegex = "";
for (var i = 0; i < ones.length; i++) {
  e = ones[i];
  if (i > 0 && e == ones[i - 1] + 1) {
    if (i != ones[i + 1] - 1) {
      onesRegex += e + "]";
    }
  } else {
 if (onesRegex != "") onesRegex += "|";
    onesRegex += "[" + e + "-";
  }
}

// Returns [0-5]|[8-9]
alert(onesRegex);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

This could then be used in an <input> (yes, jQuery is allowed):

$("input").attr("pattern", onesRegex);

The problem I am experiencing is that I am not sure how to continue. Ones are easy enough, as you see above. However, things get increasingly more difficult as soon as you start adding digits because you have to take into account so many things. For instance, you can have [112, 358, 359, 360, 361] which should result in (112|(3(5[8-9]|6[0-1]))) which is already quite extensive for only five numbers.

For my project, the maximum value is 500, so all values < 1000 should be parsable.

I have written quite a bit, but there's a lot to be done -- I need to get the logic behind it. So far my idea is to split the number in ones, tens, and hundreds, and treat them accordingly. Additionally, the appropriate function can waterfall down to other functions. For instance, parsing the number 512 could split it down into 5 and 12, 12 will go down to a function for decimals, and so on. That's the main idea, but the logic and structure is missing.

Here is what I have so far, but I also provide a JSFiddle which is a bit easier to work with.

var arr = [0, 1, 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 14, 15, 18, 19, 20, 21, 105, 106, 107, 256, 257, 258, 259, 260],
  onesArray = [],
  tensArray = [],
  hundredsArray = [];

// MAIN
function regexGenerator() {
  orderSplitter(arr);
  // Do stuff
  // Should return finished Regex as a string
}

// Split input array in ones (1 digit), tens (2 digits), hundreds (3 digits)
function orderSplitter(numberArray) {
  $(numberArray).each(function(index, element) {
    if (element < 10) {
      onesArray.push(element);
    } else if (element < 100 && element > 9) {
      tensArray.push(element);
    } else if (element < 1000 && element > 99) {
      hundredsArray.push(element);
    }
  });
}

/* Following functions expect an array as input */
function onesToRegex(ones) {
  var onesRegex = "";
  for (var i = 0; i < ones.length; i++) {
    var e = ones[i];
    if (i > 0 && e == ones[i - 1] + 1) {
      if (i != ones[i + 1] - 1) {
        onesRegex += e + "]";
      }
    } else {
      onesRegex += "[" + e + "-";
    }
  }
  return onesRegex;
}

function tensToRegex(tens) {
  var tensRegex = "";
  for (var j = 0; j < tens.length; j++) {
    var f = tens[j],
      ten = Math.floor(f / 10),
      one = f - (ten * 10);

  }

  return tensRegex;
}

function hundredsToRegex(hundreds) {
  var hundredsRegex = "";
  for (var k = 0; k < hundreds.length; k++) {
    var g = tens[j],
      hundred = Math.floor(g / 100),
      ten = Math.floor((g - (hundred * 100)) / 10),
      one = g - (ten * 10);

  }

  return hundredsRegex;
}
Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239

3 Answers3

1

As an alternative approach, consider using HTML5 <datalist>. This can be generated in JavaScript too.

var arr = [.......];

var datalist = document.createElement('datalist');
arr.forEach(function(num) {
    var option = document.createElement('option');
    option.value = num;
    datalist.appendChild(option);
});
datalist.id = "numberlist";
document.body.appendChild(datalist);

// apply to input
someInputElement.setAttribute("list","numberlist");

Here's a demo for you: https://jsfiddle.net/960sjuhc/

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • I don't think this a user-friendly approach when you can have up to 500 values. – Bram Vanroy May 05 '16 at 17:25
  • It's more user-friendly than having potentially 500 numbers and no indication of what they are ;) – Niet the Dark Absol May 05 '16 at 17:30
  • I don't understand what you mean by that. – Bram Vanroy May 05 '16 at 19:49
  • Well, how does the user know what numbers they are allowed to type? – Niet the Dark Absol May 05 '16 at 19:58
  • The array isn't fetched from an input field. The data is fetched from somewhere else. – Bram Vanroy May 05 '16 at 20:01
  • You misunderstand me. Your question about creating a `pattern` attribute for an input means that somewhere in there the user will be typing something in, right? How does the user know which numbers are OK to type and which ones are not? – Niet the Dark Absol May 05 '16 at 20:06
  • Well, that's what the `pattern` attribute is for, right? :D Together with the `title` attribute you can give feedback to the user. If that's not your question: the idea is that there's a table with indices. A user can filter these rows. The input field is a "scroll to index" field. When some filters are on (and some rows are hidden) the scroll-to field should not accept the hidden rows. Therefore, I thought'd be fun to only allow the visible rows in the input field. (Users with an unsupported browsers: their loss.) – Bram Vanroy May 05 '16 at 20:09
  • Ah, I see. In that case, it might be best to take the "validation" code from my Fiddle (the bit that reads the list to set the custom validity) and leave it at that. A custom validity setup is much more efficient in this case than a `pattern`. – Niet the Dark Absol May 05 '16 at 20:37
  • Oh, I didn't know about the custom validity. That's so cool! I am going to try it out now! +1 – Bram Vanroy May 06 '16 at 08:54
0

A proposal with a tree.

Basically it has two parts

  1. build tree with an object, where the length of the stringed numbers are the first key, and the rest are properties with one digit and an object.

  2. build the regular expression string with iterating over the first key (the length) and start iter with the content of the property and the decremented length/depth of the following object.

function getRegex(array) {
    function group(array) { // take array [0,1,3,4,5,6,8] return string '013-68'
        return array.reduce(function (r, a, i, aa) {
            if (!i || aa[i - 1] + 1 !== a) {
                return r.concat([[a]]);
            }
            r[r.length - 1][1] = a;
            return r;
        }, []).map(function (a) {
            return a.join(a[0] + 1 === a[1] ? '' : '-');
        }).join('');
    }

    function iter(o, l) { // iterate an object
        // get all keys form the object as sorted numbers
        var keys = Object.keys(o).map(Number).sort(function (a, b) { return a - b; });

        if (keys.length === 1) { // if just one key return the key and get the next level
            return keys[0] + iter(o[keys[0]], l - 1);
        }
        if (keys.length > 1) { // if more than one key
            // test the level
            // if next level
            // return parenthesis with all keys and their next levels separated with |
            // if no level
            // return grouped keys with brackets around
            return l ?
                '(' + keys.map(function (k) { return k + iter(o[k], l - 1); }).join('|') + ')' :
                '[' + group(keys) + ']';
        }
        return '';
    }

    var tree = {};
    array.forEach(function (a) {
        var o, s = a.toString();
        tree[s.length] = tree[s.length] || {};
        o = tree[s.length];
        s.split('').forEach(function (b) {
            o[b] = o[b] || {};
            o = o[b];
        });
    });

    return '(' + Object.keys(tree).map(function (k) { return iter(tree[k], +k - 1); }).join('|') + ')';
}

document.write('<pre>' + getRegex([0, 1, 2, 3, 4, 5, 8, 9]) + '</pre>');
document.write('<pre>' + getRegex([100, 200, 212, 213, 214, 357]) + '</pre>');
document.write('<pre>' + getRegex([112, 358, 359, 360, 361]) + '</pre>');
document.write('<pre>' + getRegex([0, 1, 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 14, 15, 18, 19, 20, 21, 105, 106, 107, 256, 257, 258, 259, 260]) + '</pre>');
Nina Scholz
  • 376,160
  • 25
  • 347
  • 392
  • Could you provide this with some comments (inline perhaps) of what is actually happening? Just dropping a chunk of code isn't really educational for me. I wrote my own answer, but I am interested in understanding yours. – Bram Vanroy May 05 '16 at 19:50
-1

As was pointed out in the comments, I should have had some fun with it -- and I followed your advice and here we are! My solution is probably not as efficient as Nina Scholz's answer (not tested, but that answer just looks more... detailed) but it is better readable in my opinion and it was a lot of fun to make -- and not at all as hard as I had thought, once I got my head around it.

It's on JSFiddle. And also here as a snippet. I did my best commenting the some-what harder parts, but most of it should be quite straightforward, though in retrospect I could've chosen some better variable names. Comments are welcome!

var arr = [0, 1, 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 14, 15, 18, 19, 20, 21, 105, 106, 107, 256, 257, 258, 259, 260],
  onesArray = [],
  tensArray = [],
  tensMultiArray = {},
  hundredsArray = [],
  hundredsMultiArray = {};

// MAIN
function regexGenerator(arr) {
  orderSplitter(arr);

  var onesRegexString = onesToRegex(onesArray);
  var tensRegexString = tensToRegex(tensArray);
  var hundredsRegexString = hundredsToRegex(hundredsArray);

  // Don't forget start/end points ^$
  var regex = "(^(" + onesRegexString + ")$)|(^(" + tensRegexString + ")$)|(^(" + hundredsRegexString + ")$)";

  $(".result code").text(regex);
}

regexGenerator(arr);

// Split input array in ones (1 digit), tens (2 digits), hundreds (3 digits)
// Can be extended to include others
function orderSplitter(numberArray) {
  $(numberArray).each(function(index, element) {
    if (element < 10) {
      onesArray.push(element);
    } else if (element < 100 && element > 9) {
      tensArray.push(element);
    } else if (element < 1000 && element > 99) {
      hundredsArray.push(element);
    }
  });
}

/* Following functions expect an array as input */
function onesToRegex(ones) {
  var onesRegex = "";
  for (var i = 0; i < ones.length; i++) {
    var e = ones[i];
    // If this element is not the first element, and it is equal to
    // the previous number + 1
    if (i > 0 && e == (ones[i - 1] + 1)) {
      // If this element is NOT equal to the next element - 1
      // Will also return true if next item does not exist
      if (e != (ones[i + 1] - 1)) {
        onesRegex += e + "]";
      }
    }
    // If this item is a (new) first item in a list
    else {
      if (onesRegex != "") onesRegex += "|";
      onesRegex += "[" + e + "-";
    }
  }
  return onesRegex;
}

function tensToRegex(tens) {
  var tensRegex = "";

  // Loop the array and break the number down in digits
  // E.g. 13 -> ten = 1; one = 3
  $(tens).each(function(index, element) {
    var ten = Math.floor(element / 10),
      one = element - (ten * 10);

    // Push items to associative arrays (objects)
    if (!(ten in tensMultiArray)) {
      tensMultiArray[ten] = [one];
    } else {
      tensMultiArray[ten].push(one);
    }
  });

  var i = 0;
  for (var ten in tensMultiArray) {
    if (tensMultiArray.hasOwnProperty(ten)) {
      // Each iteration is a new number, meaning it is an *alternative*
      // Hence the pipe
      if (i > 0) tensRegex += "|";
      tensRegex += ten;

      // The `one` digits belonging to ten (e.g. 1 and 2 for 11 and 12) is an array
      // Therefore we can send it down to onesToRegex to be processed
      if (tensMultiArray[ten].length > 1) {
        tensRegex += "(" + onesToRegex(tensMultiArray[ten]) + ")";
      } else {
        tensRegex += tensMultiArray[ten][0];
      }

      i++;
    }
  }
  return tensRegex;
}

function hundredsToRegex(hundreds) {
  var hundredsRegex = "";
  // Loop the array and break the number down in hundreds and rest
  // E.g. 128 -> hundred = 1; rest = 28
  $(hundreds).each(function(index, element) {
    var hundred = Math.floor(element / 100),
      rest = element - (hundred * 100);

    // Push items to associative arrays (objects)
    if (!(hundred in hundredsMultiArray)) {
      hundredsMultiArray[hundred] = [rest];
    } else {
      hundredsMultiArray[hundred].push(rest);
    }
  });

  var i = 0;
  for (var hundred in hundredsMultiArray) {
    if (hundredsMultiArray.hasOwnProperty(hundred)) {
      // Each iteration is a new number, meaning it is an *alternative*
      // Hence the pipe
      if (i > 0) hundredsRegex += "|";
      hundredsRegex += hundred;

      // The `rest` digits belonging to hundred (e.g. 28 and 29 for 128 and 129) 
      // is an array. Therefore we can send it down to tensToRegex to be processed
      // In turn, tensToRegex will also send its ones through to onesToRegex
      if (hundredsMultiArray[hundred].length > 1) {
        hundredsRegex += "(" + tensToRegex(hundredsMultiArray[hundred]) + ")";
      } else {
        hundredsRegex += hundredsMultiArray[hundred][0];
      }

      i++;
    }
  }
  return hundredsRegex;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

<h1>
Generate Regular Expression based on an input array
</h1>

<p>
  In this example the input is <code>[0, 1, 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 14, 15, 18, 19, 20, 21, 105, 106, 107, 256, 257, 258, 259, 260]</code>. The result is:
</p>
<p class="result"><code></code></p>
Community
  • 1
  • 1
Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239