3

I have a text file that consists of chapters and clauses. Its the constitution of Kenya.

I want to convert it to something similar to Flare.json which looks like below.

{"name": "ROOT",
 "children": [
        {"name": "Hemiptera",
         "children": [
             {"name": "Miridae",
              "children": [
                  {"name": "Kanakamiris", "children":[]},
                  {"name": "Neophloeobia",
                   "children": [
                       {"name": "incisa", "children":[] }
                   ]}
              ]}
         ]},
        {"name": "Lepidoptera",
         "children": [
             {"name": "Nymphalidae",
              "children": [
                  {"name": "Ephinephile",
                   "children": [
                       {"name": "rawnsleyi", "children":[] }
                   ]}
              ]}
         ]}
    ]}
}

Is there a way I can programatically do this in either Javascript, Python or R?

  • 7
    kenya give an example of the input? –  Jul 03 '15 at 13:36
  • 1
    Is it the *formatting* of the JSON that you're interested in, or how to convert the source document to nested containers? – jonrsharpe Jul 03 '15 at 13:37
  • @jasonsharpe I'm interested in how to convert the source document into nested containers – Odanga Madung Jul 04 '15 at 15:47
  • @squint you can make the input in any format you want, simple is prefferable – Odanga Madung Jul 04 '15 at 16:02
  • @Odanga Madung I'm curious about one thing (no need to answer if it's classified) : how Kenya's constitution in JSON is going to help you in your program? – Muhammad Imran Jul 05 '15 at 05:33
  • @Muhammadimran I need to use it to do some data visualization in d3.js – Odanga Madung Jul 05 '15 at 08:13
  • @Muhammadimran aside from copy pasting the text file into the nested arrays, do you have any ideas on how to convert the text in the pdf file into the nested array? – Odanga Madung Jul 05 '15 at 08:21
  • http://stackoverflow.com/questions/12066118/reading-pdf-file-using-javascript , also, http://stackoverflow.com/questions/1554280/extract-text-from-pdf-in-javascript , might help you. – Muhammad Imran Jul 05 '15 at 10:38
  • if programmatically not possible or difficult for you, then one option can be to copy all text from your pdf file past it on notepad++, and use find and replace functionality based on certain patterns, so that text can be converted to nessted array. otherwise process text file to convert it to nested array by using javascript or anyother programming language(like C#, or anyother...). By the way instead of converting it to nested arrays, why not convert it directly to JSON there... – Muhammad Imran Jul 05 '15 at 10:47
  • @Muhammadimran I ran a test using the following array `[[ "THE REPUBLIC OF KENYA", ["1. Kenya is a sovereign Republic. 1A. The Republic of Kenya shall be a multiparty democratic state."], ["2. The Public Seal of Kenya shall be such device as may be prescribed by or under an Act of Parliament. 2A. (Repealed by 12 of 1991, s. 2)."]]]`; But when I place a breakpoint in return `kenJSON;` I don't see the output, Nothing is appearing in the console as well. How do I view the result? – Odanga Madung Jul 06 '15 at 12:45
  • @Odanga Madung write `console.log(JSON.stringify(kenJSON))` before `return kenJSON;` in function `kenyaConstitutionToJSON()`. – Muhammad Imran Jul 06 '15 at 13:40
  • @Muhammadimran still nothing is appearing in the console, have a look at my code in this jsfiddle http://jsfiddle.net/odanga/hj33zd0a/ – Odanga Madung Jul 06 '15 at 17:05
  • @OdangaMadung you didn't call `` function from your script at all as a consequence javascript was not calling. Updated your code in http://jsfiddle.net/9s35yakk/ – Muhammad Imran Jul 07 '15 at 04:06
  • @OdangaMadung the array you have created is incorrect. Create it according to template which i have pasted below, like `[["chapter1",[["clause1", [["subclause1"], ["subclause2"]]],["clause2",[["subclause2-1"], ["subclause2-2"]]]]]];` – Muhammad Imran Jul 07 '15 at 04:08

1 Answers1

1

First, let me propose an input format for you. which could be like:

var kenyaConstitutionArray = ["1#SOVEREIGNTY OF CONSTITUTION", "1:1#All sovereign...", "1:2#...",....,"100#....","100:1#..."]

Where only 1# represents chapter, 1:1# represents first sub-clause of chapter 1, and 1:1:1# represents first sub-sub-clause of chapter 1. I've used # because i assume it will not appear in the text.

To get chapters and clauses, you need to do the following:

var path = text.substr(0, text.indexOf('#'));//it will give path or levels

Here, text is element of array. E.g., text = kenyaConstitutionArray[1]

Now, you have to get chapter:

var chapter = path.substr(0, path.indexOf(':'));

Get sub-clauses in the same way, with little modifications,

And, build json either in the loop or recursively.

Other way is to:

for input, you can use nested arrays as-well. like:

var kenyaConstitution = [["chapter1",[["clause1", [["subclause1"], ["subclause2"]]],["clause2",[["subclause2-1"], ["subclause2-2"]]]]]];

Converting above nested array to json will be very easy for you. In this case, good way would be using recursion.

EDIT:

Complete Code:

[Ignore comments in the code.]

    <!DOCTYPE html>
<head>
    <title>JSON</title>
        <script>
            function kenyaConstitutionToJSON() {
                var kenyaConstitution = [["chapter1",[["clause1", [["subclause1"], ["subclause2"]]],["clause2",[["subclause2-1"], ["subclause2-2"]]]]]];
                var kenyaChapterJSON;
                var kenJSON = {};
                kenJSON["name"] = "Constitution of Kenya";
                kenJSON["children"] = [];
                if(kenyaConstitution.length === 0) {
                        console.log("Constitution is empty! Please pass constitution through Parliament...");
                        return;
                    } else {
                        for (var chapter in kenyaConstitution) { //for each chapter return json
                            kenyaChapterJSON = convertToJSON(kenyaConstitution[chapter]) || {};
                            kenJSON["children"].push(kenyaChapterJSON);
                        }

                    }
                    return kenJSON;
            }
            function convertToJSON(constitutionArray) { 
                    var obj = {};
                    //constitutionArray[item] = ["chapter1",[["clause1", [["subclause1"], ["subclause2"]]],["clause2",[["subclause2-1"], ["subclause2-2"]]]]]
                    obj["name"] =   constitutionArray[0]; // {name: "children1", children=[ ]}
                    obj["children"] = [];
                    //if(constitutionArray.length > 0) {
                        for (var cl in constitutionArray[1]) {
                            var kenJSON1 = convertToJSON(constitutionArray[1][cl]);
                            obj["children"].push(kenJSON1);
                        }
                    //} else {
                        //obj["children"].push(constitutionArray[0]);
                    //}
                    return obj;

            }

            kenyaConstitutionToJSON();
        </script>
</head>
<body>
</body>

Place breakpoint on return kenJSON; line and see the output. It'd be like:

OUTPUT:

{
    "name":"Constitution of Kenya",
    "children":[
        {
            "name":"chapter1",
            "children":[
                {
                    "name":"clause1",
                    "children":[
                        {
                            "name":"subclause1",
                            "children":[

                            ]
                        },
                        {
                            "name":"subclause2",
                            "children":[

                            ]
                        }
                    ]
                },
                {
                    "name":"clause2",
                    "children":[
                        {
                            "name":"subclause2-1",
                            "children":[

                            ]
                        },
                        {
                            "name":"subclause2-2",
                            "children":[

                            ]
                        }
                    ]
                }
            ]
        }
    ]
}

Hope that'd help.

Muhammad Imran
  • 734
  • 7
  • 21
  • Thank you for the insight Imran, could you edit your answer to include the recursive method for converting the array into the JSON data? Thanks again – Odanga Madung Jul 04 '15 at 16:00