7

I have a list of all Shakespeare sonnets and I'm making a function to search for each sonnet. However, I want to be able to search them using arabic numbers (for example "/sonnet 122". The .txt is formatted this way:

I

This is a sonnet

II

This is a second sonnet

I am using node right now to try to do it, but I've been trying since yesterday to no avail. My last attempts yesterday were using the 'replace' method as such:

'use strict';
//require module roman-numerals, which converts roman to arabic
var toArabic = require('roman-numerals').toArabic;
//require file-handling module
var fs = require('fs');

fs.readFile('sonn.txt', 'utf8', function (err,data) {
    if (err) {
        console.log(err);
    } else {
        var RN = /[A-Z]{2,}/g; 
        var found = data.match(RN); //finds all roman numbers and puts them in an array
        var numArr = [];
        for (var i = 0; i < found.length; i++ ){
            numArr.push(toArabic(found[i])); //puts all arabic numbers in numArr
        }
        for (var e = 0; e < found.length; e++){
            data.replace(found, found.forEach((x, i)=> {
            toArabic(x)
    }
});

Then I tried replacing them with:

data.replace(found, function(s, i){
    return numArr[i];
});

Then I tried with a for loop. I didn't keep that code, but it was something like:

for(var i=0;i<found.length;i++){
    data.replace(found, numArr[i]);
}

The last code replaces each number and then erases the data and replaces the next number as such:

replace(abc, 123) -> 1bc, a2c, ab3

How do I make it iterate each occurrence in the data and keep it? Then saving it to a new txt should be easy.

(Also, my RegExp only finds multiple character roman numbers to avoid replacing lonely I's that could be found at the end of a line.)

Besto
  • 388
  • 1
  • 13
  • So you're trying to convert roman numerals to regular digits? How about -> http://blog.stevenlevithan.com/archives/javascript-roman-numeral-converter – adeneo Nov 21 '16 at 15:37
  • Here's a bunch of examples in Java -> http://stackoverflow.com/questions/9073150/converting-roman-numerals-to-decimal – adeneo Nov 21 '16 at 15:40
  • @adeneo the module "roman-numerals" converts them, but it only takes strings as value, so I have to use it inside the replace function or inside the loop so that each iteration gives it a string. My problem is not converting but replacing all occurences in the original string, which is the .txt file. – Besto Nov 21 '16 at 15:48
  • Could your search function not convert the number into roman numerals rather than replacing them in the .txt? EG: The search request `/sonnet 122` is converted to `/sonnet CXXII`. Seems somewhat simpler than refactoring the .txt. – Moob Nov 21 '16 at 16:45
  • A horse, a horse... My kingdom for a horse. – Kurt Van den Branden Dec 19 '16 at 11:20

3 Answers3

1

If you use String.prototype.replace, you can use your regular expression and a custom replacement function. You just need to return the value to use as a replacement, which is what toArabic does.

var data = 'I\n\nThis is a sonnet\n\nII\n\nThis is a second sonnet';

//========================

var toArabic = (function () {
  var forEach = Array.prototype.forEach;


  /**
   * Converts a roman number to its arabic equivalent.
   *
   * Will throw TypeError on non-string inputs.
   *
   * @param {String} roman
   * @return {Number}
   */
  function toArabic (roman) {
    if (('string' !== typeof roman) && (!(roman instanceof String))) throw new TypeError('toArabic expects a string');

    // Zero is/was a special case. I'll go with Dionysius Exiguus on this one as
    // seen on http://en.wikipedia.org/wiki/Roman_numerals#Zero
    if (/^nulla$/i.test(roman) || !roman.length) return 0;

    // Ultra magical regexp to validate roman numbers!
    roman = roman.toUpperCase().match(/^(M{0,3})(CM|DC{0,3}|CD|C{0,3})(XC|LX{0,3}|XL|X{0,3})(IX|VI{0,3}|IV|I{0,3})$/);
    if (!roman) throw new Error('toArabic expects a valid roman number');
    var arabic = 0;

    // Crunching the thousands...
    arabic += roman[1].length * 1000;

    // Crunching the hundreds...
    if (roman[2] === 'CM') arabic += 900;
    else if (roman[2] === 'CD') arabic += 400;
    else arabic += roman[2].length * 100 + (roman[2][0] === 'D' ? 400 : 0);


    // Crunching the tenths
    if (roman[3] === 'XC') arabic += 90;
    else if (roman[3] === 'XL') arabic += 40;
    else arabic += roman[3].length * 10 + (roman[3][0] === 'L' ? 40 : 0);

    // Crunching the...you see where I'm going, right?
    if (roman[4] === 'IX') arabic += 9;
    else if (roman[4] === 'IV') arabic += 4;
    else arabic += roman[4].length * 1 + (roman[4][0] === 'V' ? 4 : 0);
    return arabic;
  };
  return toArabic;
})();

//====================

var RN = /[A-Z]{1,2}(?=\n)/g;
var newData = data.replace(RN, toArabic);
document.body.innerText = newData;
JstnPwll
  • 8,585
  • 2
  • 33
  • 56
  • Got things like CXLV2 CXL6 CX55 XCVI2 (III got converted to I2). Other than that I read your code and it's magnificent. – Besto Nov 21 '16 at 16:29
  • Right, I wasn't trying to pick which regex syntax you should use. Just showing you how the replace works. – JstnPwll Nov 21 '16 at 17:06
  • Aaaah, alright. Sorry for misinterpreting you. In the end I did it exactly as you proposed, but taken from adeneo's answer. I guess if I had understood your purpose it would have gone right earlier. :P – Besto Nov 21 '16 at 17:09
1

You have to write the replaced string back, and you could use a callback for replace()

'use strict';

var toArabic = require('roman-numerals').toArabic;
var fs = require('fs');

fs.readFile('sonn.txt', 'utf8', function (err,data) {
    if (err) {
        console.log(err);
    } else {
        data = data.replace(/[A-Z]{2,}/g, function(x) {
            return toArabic(x);
        });
    }
});

Here are some more regular expressions to match romans

Community
  • 1
  • 1
adeneo
  • 312,895
  • 29
  • 395
  • 388
  • Dude, this did it, are you a wizard? Hahaha. – Besto Nov 21 '16 at 16:31
  • I think the main difference was you re-declared "data". – Besto Nov 21 '16 at 16:33
  • @Besto - you're welcome. Strings are immutable, so you always have to "write them back", also this just uses the global regex directly in the replace, and then the callback returns the value from the `toArabic` function. Putting everything in arrays is a lot more complicated to get right. – adeneo Nov 21 '16 at 16:36
  • As a sidenote, you could just do `data = data.replace(/[A-Z]{2,}/g, toArabic)` as well, I added the anonymous function to make it a little more verbose – adeneo Nov 21 '16 at 16:37
  • Ah! tried that one too and it also works haha, ohwell, I didn't know you could call functions without parentheses like that. Thanks for the teaching. :) – Besto Nov 21 '16 at 17:00
  • @Besto - you can reference the function, and the arguments will be passed automatically, which is why it works here, the arguments are in the same place etc. – adeneo Nov 21 '16 at 18:05
1

This kind of thing is best handled as a stream transform. The old node stream transform library is a bit funky to initialize, but it gets the job done very fast and well. Here's a working example using the replace function that @adeneo wrote above.

var stream = require('stream');
var util = require('util');
var toArabic = require('roman-numerals').toArabic;
var fs =require('fs');

var rstream = fs.createReadStream('sonnets.txt');
var wstream = fs.createWriteStream('sonnets.transformed.txt');

// node v0.10+ use native Transform, else polyfill
var Transform = stream.Transform ||
  require('readable-stream').Transform;

function Converter(options) {
    // allow use without new
    if (!(this instanceof Converter)) {
        return new Converter(options);
    }

    // init Transform
    Transform.call(this, options);
}

util.inherits(Converter, Transform);

Converter.prototype._transform = function (chunk, enc, cb) {

    //transform the chunk
    var data = chunk.toString().replace(/[A-Z]{2,}/g, function(x) {
            return toArabic(x);
        });

    this.push(data); //push the chunk

    cb(); //callback

};


// try it out
var converter = new Converter();

// now run it on the whole file
rstream
    .pipe(converter)
    .pipe(wstream)  // writes to sonnets.transformed.txt
    .on('finish', function () {  // finished
        console.log('done transforming');
     });

This is pretty well covered here: http://codewinds.com/blog/2013-08-20-nodejs-transform-streams.html and here with more modern examples using the through2 transform libs https://github.com/substack/stream-handbook

4m1r
  • 12,234
  • 9
  • 46
  • 58
  • This works too :) What would you say are the pros and cons of using this instead of adeneo's simple replace method? – Besto Nov 21 '16 at 16:45
  • I mean, adeneo's answer isn't changed. What are the pros and cons of using this instead of the 'fs' module? – Besto Nov 21 '16 at 16:46
  • Not too much of a problem when you're dealing with one file, but depending on file size, there are memory limitations when not using streams because you have to buffer the entire file before working on the lines. https://github.com/substack/stream-handbook#why-you-should-use-streams – 4m1r Nov 21 '16 at 16:47
  • oh and you still are using the fs module, which ultimately returns a buffer or buffer stream. also this is a more extendable design, say you need a function to also transform the capital letters of each line. you simply write another converter and pipe it into the stream. – 4m1r Nov 21 '16 at 16:48
  • Oh! Interesting! So it is indeed a very flexible system. I'll be sure to read more about streams, this sounds like it could be used for when I turn to mass production and transformation of files. Thank you very much for your explanations. :) – Besto Nov 21 '16 at 16:55