1

I'm reading a text file and converting it to JSON format using regex in my react project.It is working fine but not including last 20-30 lines of the text file. There is some problem while converting it to JSON but I am unable to understand the problem.

Here is my code:

    readTextFile = file => {
        let rawFile = new XMLHttpRequest();
        rawFile.open("GET", file, false);
        rawFile.onreadystatechange = () => {
            if (rawFile.readyState === 4) {
                if (rawFile.status === 200 || rawFile.status === 0) {
                    let allText = rawFile.responseText;
                    // console.log(allText)

                    let reg = /\d\d\d\d-(0?[1-9]|1[0-2])-(0?[1-9]|[12][0-9]|3[01]) (00|[0-9]|1[0-9]|2[0-3]):([0-9]|[0-5][0-9]):([0-9]|[0-5][0-9])/g;

                    let arr = [];
                    let start = null;
                    let line, lastSpacePos;
                    let match;
                    while ((match = reg.exec(allText)) != null) {
                        if(start) {
                            line = allText.slice(start, match.index).trim();
                            lastSpacePos = line.lastIndexOf(' ');
                            arr.push({
                                date: line.slice(0, 19),
                                text: line.slice(20, lastSpacePos).trim(),
                                user_id: line.slice(lastSpacePos).trim()
                            });
                        }

                        start = match.index
                    }
                    console.log(arr);

                    this.setState({
                        // text: JSON.stringify(arr)
                        text: allText
                    });
                }
            }
        };

1 Answers1

0

Am not certain about the issue with the existing code at Question.

To get expected result described at Question utilizing an alternative approach you can use RegExp /\s{2,}|\n+/g to replace space characters greater than 2 and new line characters; /[\d-]+\s[\d:]+/g to get dates; /.+(?=\s\w+\s$|\s\w+$)|\w+\s$|\w+$/g to match text that is followed by space, word characters and space character or end of string and characters before space characters followed by word characters and space character or end of string; return an object with a property set for each element of the array from .map()

let allText = `2014-06-01 23:07:58 President Resigns in Georgia’s Breakaway Region of 
Abkhazia t.co/DAploRvCvV                                                    nytimes 
2014-06-01 23:48:06 The NYT FlipBoard guide to understanding climate 
change and its consequences t.co/uPGTuYiSmQ                                 nytimes 
2014-06-01 23:59:06 For all the struggles that young college grads 
face, a four-year degree has probably never been more valuable 
t.co/Gjf6wrwMsS         nytimes 
2014-06-01 23:35:09 It's better to be a community-college graduate than 
a college dropout t.co/k3CO7ClmIG                                           nytimes 
2014-06-01 22:47:04 Share your experience with Veterans Affairs health 
care t.co/PrDhLC20Bt                                                        nytimes 
2014-06-01 22:03:27 Abandon Hope, Almost All Ye Who Enter the N.B.A. 
Playoffs t.co/IQAJ5XNddR                                                    nytimes`;

// replace more than one consecutive space character and new line characters
allText = allText.replace(/\s{2,}|\n+/g, " ");
// get dates
let dates = allText.match(/[\d-]+\s[\d:]+/g);
// get characters that are not dates
// spread `dates` to resulting array
// return object
let res = allText
.split(/[\d-]+\s[\d:]+\s/)
.filter(Boolean)
.map((text, index) => 
  [dates[index], ...text.match(/.+(?=\s\w+\s$|\s\w+$)|\w+\s$|\w+$/g)])
.map(([date, text, user_id]) => ({date, text, user_id}));

console.log(res);
guest271314
  • 1
  • 15
  • 104
  • 177
  • date regeex is returning false value for 1st date field: "------- 2014" "-06-01 23:07:58" but it needs to be "2014-06-01 23:07:58" – Srajan Rastogi Jan 16 '18 at 08:19
  • @SrajanRastogi What do you mean by "false value"? The first date is "2014-06-01 23:07:58" – guest271314 Jan 16 '18 at 08:21
  • getting this error: "TypeError: Cannot convert undefined or null to object" for this line ".map((text, index) =>" – Srajan Rastogi Jan 16 '18 at 08:22
  • At stacksnippets? At which browser are you trying the code? – guest271314 Jan 16 '18 at 08:23
  • but it is returning 2 different dates in place of that which are: "------- 2014" and "-06-01 23:07:58" – Srajan Rastogi Jan 16 '18 at 08:24
  • I'm trying to run it in chrome as a whole package using node. – Srajan Rastogi Jan 16 '18 at 08:25
  • The array of objects returned at Chromium 63 is `[{"date":"2014-06-01 23:07:58","text":"President Resigns in Georgia’s Breakaway Region of Abkhazia t.co/DAploRvCvV","user_id":"nytimes "},{"date":"2014-06-01 23:48:06","text":"The NYT FlipBoard guide to understanding climate change and its consequences t.co/uPGTuYiSmQ","user_id":"nytimes "},{"date":"2014-06-01 23:59:06","text":"For all the struggles that young college grads face, a four-year degree has probably never been more valuable t.co/Gjf6wrwMsS","user_id":"nytimes "}` – guest271314 Jan 16 '18 at 08:26
  • `,{"date":"2014-06-01 23:35:09","text":"It's better to be a community-college graduate than a college dropout t.co/k3CO7ClmIG","user_id":"nytimes "},{"date":"2014-06-01 22:47:04","text":"Share your experience with Veterans Affairs health care t.co/PrDhLC20Bt","user_id":"nytimes "},{"date":"2014-06-01 22:03:27","text":"Abandon Hope, Almost All Ye Who Enter the N.B.A. Playoffs t.co/IQAJ5XNddR","user_id":"nytimes"}` which is the corresponding parsing of the text at OP to `JSON` as described as the requirement. Not sure how nodejs is related to the original question? – guest271314 Jan 16 '18 at 08:27
  • when I did console.log(dates) it is returning something like this """["------- 2014", "-06-01 23:07:58", "2014-06-01 23:48:06", "2014-06-01 23:59:06", "2014-06-01 23:35:09", "2014-06-01 22:47:04", "2014-06-01 22:03:27", "2014-06-01 22:19:06", "2014-06-01 22:15:06", "2014-06-01 21:43:06", "2014-06-01 21:06:34", "2014-06-01 21:31:03"]""" – Srajan Rastogi Jan 16 '18 at 08:35
  • The input text that you are trying the code with is different from the input text at question. `"------- "` does not appear at text at OP – guest271314 Jan 16 '18 at 08:36
  • Thanks, sorry about the confusion.I'm new to this and trying to learn. – Srajan Rastogi Jan 16 '18 at 08:37
  • Sorry, my bad I added the link to the source file in the comment. ; link to the source file – Srajan Rastogi Jan 16 '18 at 08:38
  • See https://stackoverflow.com/help/how-to-ask, https://stackoverflow.com/help/mcve – guest271314 Jan 16 '18 at 08:38
  • You can include a `RegExp` to replace sequential occurrences of `"-"` character, or all characters before the input text at question, then use the code at answer. – guest271314 Jan 16 '18 at 08:40
  • I'll remove "-" multiple occurrences but why am I getting that error """TypeError: Cannot convert undefined or null to object" for this line ".map((text, index) =>" – ""? Any idea? – Srajan Rastogi Jan 16 '18 at 08:42
  • No, did not get an error here using the same code with the input at original question. – guest271314 Jan 16 '18 at 08:43