2

I am building an app that is based on a data that I am getting from an API formated as response simmilar to that of json_encode() function. Now, I don't know how (I know it is possible), that API from time to time returns duplicate keys. Since it is not my site to fix the duplicate keys, I have to find a solution how to parse this response into usable array that contains all the data sent to me. I would like to rename duplicate keys by prepending "x01", "x02", "x03"...into their keys. So far, I have found one solution that could help (here ), but it seams to me that it is only for the simpler arrays - I deal with nested arrays. So, here is a demo API response:

{"boss":"mike",
    "employees":{
       "Josh":{
           "active":"1",
           "hours":"12",
           "name":"Josh"},
       "Josh":{
           "active":"1",
           "hours":"3",
           "name":"Josh"}
    }
}

So, as you can see, almost entire "Josh" key is duplicated (except for the "hours" sub key). Even though it looks like an error, both values are important for me. So, this is the array that I would like to get:

array (
  "boss" => "mike",
  "employees" = array (
    "x01Josh" = array ("active" => "1", "hours" => "12", "name" => "Josh")
    "x02Josh" = array ("active" => "1", "hours" => "3", "name" => "Josh")
  )
)

I've had an idea about even going one word at a time checking if it is a string, double quotes, comma or any of the curly braces and build a function accordingly. I recon it would take a very long and inefficient code (slow to process).

Since I am new to the RegEx, and since I did not find the solution with that as well (I am sure it exists, but I did not manage to find it), I am asking you for help.

I plan to parse it either in JavaScript/jQuery or PHP(using another JSON request). Thanks in advance.

UPDATE I've forgot to mention - the reason why I'm doing this is because the second key overwrites the first one (duplicate). Also, I can't change the API (it's not my site's API). I've already asked them to implement some changes regarding those issues, but in the meantime, I have to do it from my side.

Community
  • 1
  • 1
David Šili
  • 117
  • 2
  • 12
  • 2
    in javascript a duplicate key overwrites the first one. in strict mode, it returns an error. – Nina Scholz Aug 17 '16 at 09:05
  • exactly - that is why I need to parse it in some way so that it can be used without error, or overwriting. – David Šili Aug 17 '16 at 09:14
  • Do these duplicated keys come behind each other exactly? – revo Aug 17 '16 at 09:20
  • yes.Also, some times there is only one instance of "josh", and some times two (and thus duplicates). – David Šili Aug 17 '16 at 09:26
  • Do what you can to figure out why the data is unpredictable. It looks like you're doing patches based on the data unreliability, and it looks like a waste of time. Furthermore if you don't have any control over the external API. – Christian Bonato Aug 17 '16 at 09:29
  • This was just a demo. Real API is generated based on CryptoCurrency mining performance. Whenever one worker (computer doing the mining) is switching from one coin to the otherone, in the next hour it generates both outputs. One output's hashrate is rising and otherone's is falling, but for the next hour they both coexist under the same name. After that the one that came to 0 disapears. – David Šili Aug 17 '16 at 09:35

2 Answers2

3

The JS solution using String.replace, String.match, Array.filter and Array.shift functions:

var jsonData = '{"boss":"mike","employees":{"Josh":{"active":"1","hours":"12","name":"Josh"},"Josh":{"active":"1","hours":"3","name":"Josh"}}}',
    re = /\"(\w+?)(?=\":\{)/g, names = {}, dups;

jsonData.match(re).forEach(function(v){
    v = v.replace(/\"/, "");
    (!names[v])? names[v] = 1 : names[v]++;
});

dups = Object.keys(names).filter(function(k) { return names[k] > 1; }); // getting duplicated keys
dups.forEach(function (k) {
    var count = names[k], i;
    names[k] = [];

    for (i = 0; i < count; i++) {
        names[k].push("x0" + (i+1));
    }
});

jsonData = jsonData.replace(re, function (p1) { // replacing duplicate keys
    p1 = p1.substr(1);
    if (dups.indexOf(p1) !== -1) {
        return '"' + names[p1].shift() + p1;
    } else {
        return '"' + p1;
    }
});

console.log(JSON.parse(jsonData));
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • It works like a charm and everything that you've written shows exactly like it should based on the given string. However, when I tried my real API string, it did not work because the key had a period inside a name ("agapetos.main"). When I tried just "agapetos" everything worked nicely. Is it possible to add that to a function not to be thrown off by a period? – David Šili Aug 17 '16 at 14:37
  • My real (not the demo, simplified one) looks like this: {"username":"agapetos","workers":{"agapetos.main":{"alive":"1","hashrate":"8351","username":"agapetos.main"},"agapetos.main":{"alive":"1","hashrate":"22668","username":"agapetos.main"}}} – David Šili Aug 17 '16 at 14:38
  • 1
    @agapetos, ok, I'll do 'it after the gym (if you haven't got any other solution till that time) – RomanPerekhrest Aug 17 '16 at 14:58
  • So, instead of `/\"(\w+?)(?=\":\{)/g` I've put `/\"([\w\.]+?)(?=\":\{)/g` and that did the trick (now it accepts periods in keys as well). Thank you for the great code :) – David Šili Aug 17 '16 at 16:22
  • 1
    @agapetos, you're welcome. I'm glad that you guessed that "trick" – RomanPerekhrest Aug 17 '16 at 20:20
1

Is it not possible to have the API changed? What you have posted is not standed JSON. It returns a map with duplicate keys, not an array of objects.

If you need to "work with what you get". Then you probably need to look into building your own parser. Using regular expressions for nested structures is not ideal.

nolan
  • 93
  • 7
  • I recon as well that they (the owner of the site owning the API) did not encode it by using json_encode(), but rather built it them selves in some way. As you're saying, I need to work with what I get. And that is precisely what this topic is about - how to make a good parser for these kind of situations. I've tought about making one by analysing words one by using functions in PHP that I'm comfortable with. But, since I am still not that good in regex, it might either take me days to come up with a good parser or several hours to come up with a huge inefficient code. – David Šili Aug 17 '16 at 09:19
  • Would [this solution](http://stackoverflow.com/a/30845581/6693643) help? – Dhananjay Aug 17 '16 at 09:24
  • @agapetos I would probably look into building my own state machine for parsing their "JSON" string. Checking whether it is parsing a key or a value and building the equivalent in PHP. With the added note that the value for employees needs to be an array and not a map. – nolan Aug 17 '16 at 09:29
  • @nolan So, if I understood correctly, it would produce something like this: `array ( "boss" => "mike", "employees" = array ( array ("active" => "1", "hours" => "12", "name" => "Josh") array ("active" => "1", "hours" => "3", "name" => "Josh") ) )` If so, that would do the job well. I'll test it now :) Thanks – David Šili Aug 17 '16 at 09:40
  • well, I've tried it. And it probably works with simple arrays, but with mine it does not. Here is what it returned from mine: [link](https://dl.dropboxusercontent.com/u/10871766/JSONlike.png) – David Šili Aug 17 '16 at 09:56
  • @agapetos Where is that JSON from? The logic that you have made? For JSON [ ] is the array/list notation, not { }. { } is for object/map. Instead of `"employees" : {Josh":{ "active":"1", "hours":"12", "name":"Josh"}, ... }` you want `"employees" : [ { "active":"1", "hours":"12", "name":"Josh"}, ... ]` – nolan Aug 17 '16 at 10:47
  • @nolan you are right - it seams as if that is some totally other type of mapping. – David Šili Aug 17 '16 at 13:18
  • Unfortunately, it seems pretty common for web APIs to return duplicate keys in JSON. Since it is done so much, I'm surprised there isn't a common solution. – Agent Friday Aug 11 '19 at 18:37