0

I have JSON string converted from VDF (Valve Data Format) with regex like this:

{"items_game": {
    "prefabs": {
        ...
        "coupon_crate_prefab": {
            "prefab": "weapon_case_base",
            "item_type": "coupon_crate",
            "attributes": {
                "cannot trade": "1"
            },
            "capabilities": {
                "can_delete": "0"
            },
            "attributes": {
                "expiration date": {
                    "attribute_class": "expiration_date",
                    "force_gc_to_generate": "1",
                    "use_custom_logic": "expiration_period_days_from_now",
                    "value": "2"
                }
            }
        },
        "coupon_key_prefab": {
            "prefab": "csgo_tool",
            "item_type": "coupon_key",
            "attributes": {
                "cannot trade": "1"
            },
            "capabilities": {
                "can_delete": "0"
            },
            "attributes": {
                "expiration date": {
                    "attribute_class": "expiration_date",
                    "force_gc_to_generate": "1",
                    "use_custom_logic": "expiration_period_days_from_now",
                    "value": "2"
                }
            }
        }
        ...
    }
}

Wanted result:

        "coupon_key_prefab": {
            "prefab": "csgo_tool",
            "item_type": "coupon_key",
            "attributes": {
                "cannot trade": "1",
                "expiration date": {
                    "attribute_class": "expiration_date",
                    "force_gc_to_generate": "1",
                    "use_custom_logic": "expiration_period_days_from_now",
                    "value": "2"
                }
            },
            "capabilities": {
                "can_delete": "0"
            }
        }

As you can see, there is duplicates of attributes and I need to merge them, because it's invalid in JSON.
How can I do this? (Probably with preg_replace)

Shamil Yakupov
  • 5,409
  • 2
  • 16
  • 21
  • 2
    Just out of curiosity, what happens when you deserialize the object? You should also note your programming language of choice as regex is not going to solve your problem, you will likely need to address the data structure in your code. Also, depending on language you are using serialization behavior may be different. – Mike Brant Jul 01 '15 at 14:05
  • @MikeBrant, only last `attributes` exist after deserialization. Yes, I'm using PHP. – Shamil Yakupov Jul 01 '15 at 15:34
  • Regarding your last line, don't jump on the preg_replace bandwagon. It really is **never** a good idea to use regexp to manipulate json as it leaves you open to attacks, especially if your source is compromised. JSON is an encoding with a well defined and _context sensitive_ structure. If you dont manage to mess up the structure or inject bad data, you can be sure a hacker will find a way. – Phil Jul 10 '15 at 21:11

2 Answers2

3

It is a very bad idea to do this with regex, because JSON is a data structure that can be formatted several ways and does things like nesting.

This makes it a bad idea to parse with regular expressions, because if you do, at best you'll create brittle code.

But I'm also not sure of the validity of what this is doing - if you run your JSON through a validator, the duplicate keys overwrite each other.

use strict;
use warnings;

use JSON;

local $/; 
print to_json ( from_json ( <DATA>) , { pretty => 1 } );

__DATA__
{
    "items_game": {
        "prefabs": {
            "coupon_crate_prefab": {
                "prefab": "weapon_case_base",
                "item_type": "coupon_crate",
                "attributes": {
                    "cannot trade": "1"
                },
                "capabilities": {
                    "can_delete": "0"
                },
                "attributes": {
                    "expiration date": {
                        "attribute_class": "expiration_date",
                        "force_gc_to_generate": "1",
                        "use_custom_logic": "expiration_period_days_from_now",
                        "value": "2"
                    }
                }
            }
        }
    }
}

This'll parse your JSON, which I hope I have fixed to match your source - note that it's 'clobbered' part of your data. I think this is common behaviour in most parse libraries. So it may actually mean that your 'thing' is being 'handled' in the same way.

http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf

So it's hard to give you a firm answer on what is best to do with this. Ideally you would use a JSON parser, but what you are doing is not defined within the JSON spec, so you will get variable results.

Edit: Following from comments - seems VDF is like JSON, but not quite the same.

I still wouldn't use a regex, but instead might try a recursive parse. Key it off { and 'hand down' your JSON-like content so you get a bottom branch of named key-value pairs that you can then hashify.

If there's still not a better answer, I may hack together a perl example later (sorry, don't have time at the moment).

You might find something you can use here: http://www.perlmonks.org/?node_id=995856

But that might also be a good example of why NOT to regex this :)

Sobrique
  • 52,974
  • 7
  • 60
  • 101
  • 1
    "bad idea to do this with regex" - agreed! In the same vein: With Regex and HTML, this is kindof the same issue http://stackoverflow.com/a/1732454/1143126 – RobertG Jul 01 '15 at 15:13
  • I'm converting VDF to JSON with regex, because this formats are pretty similar. But in VDF it's valid to have same keys in one "object", in JSON it's not valid. Also question updated. – Shamil Yakupov Jul 01 '15 at 15:37
  • Really? They thought that something _like JSON_ but that -wasn't quite- JSON was a good idea? How ... irritating. There's a perl VDF parser: https://github.com/killerfish/vdfparser The PHP one irritatingly goes to a broken link. – Sobrique Jul 01 '15 at 16:13
0

Well, you asked for regex. Is it possible? Probably, if you have a limited number of nested elements inside your attribute of interest. Is it a good idea? No.

(?<=\"attributes\":) (\{(?:(?:[^{]*?\{(?:[^{]|\n)*?\}[^{]*?)+|(?:[^{]|\n)*?)}) will extract all the attributes data and takes care of one-level nested arguments within your attribute, as seen https://regex101.com/r/rC3eK4/6 .

Since you only had 1 level in your example, it works very well. If you wanted to have 2 levels, you'd have to modify it by adding the option of 2 levels and so on, in order to keep the integrity of all {}. There might be a better way to solve the parenthesis-esque regex problems, but it's definitely not the best tool to do it.

Andris Leduskrasts
  • 1,210
  • 7
  • 16
  • Thanks for your answer Andris. I'm converting VDF to JSON with regex, because this formats are pretty similar. But in VDF it's valid to have same keys in one "object", in JSON it's not. Also question updated. – Shamil Yakupov Jul 01 '15 at 15:40