Sidestepping possible delimiter collisions

Question

So I have user accounts that have a lot of data, and sometimes people create duplicate accounts and administrators need to merge them non-destructively.

I've written a script that combines the two accounts, and stores a record in a merge_record table. Each row of data is stored as one merge_record entry, containing the origin account, destination account, action type(deletion, merge or exclusion), table name, column that denotes the account number, and an encoded string containing all the key/value pairs in the following format:

columnoneMERGEBANANAEQUALSvalueoneMERGEBANANASPLITTERcolumntwoMERGEBANANAEQUALSvaluetwo

Yes, that's probably hard to read - but my goal was to delimit the data pairs by some incredibly-unlikely-to-be-used-by-a-user string, and have equals be equally unlikely to be non-delimited. The reason I need this is because I've also created an Undo Merge button, and it needs to be reversible - so the Undo Merge scans each row of merge_record, deconstructs the column/value pairs and inserts into table_name under the original_account's ID in the account_column section.

However, I still don't like it. Some jerk who read this post could type MERGEBANANAEQUALS as their name, request a merge, and then request an undo. Is there any way to absolutely guarantee that no collisions can occur? Or should I re-design the way the key/value pairs are stored? If so, what is a better way?

That is definitely a poor way to store that data, especially since you want to be able to parse it later on. If you want to use key/value pairs, at least use json e.g. `{"columnone":"valueone","columntwo":...}`. Ideally if you know what fields are retained, create actual columns in your merge table (or another table which references the merge table) and store the data in a normalised way. — 404, Nov 29 '18 at 16:59
Can't use JSON, account data could include colons, quotation marks or curly braces. If I can guarantee that they don't somehow include escaped characters in it, that'd work, but some things are stored escaped already due to how it's structured. Thousands of fields are retained, and new ones are created every day. I cannot know all key/value pairs, or pre-define all keys due to changing infrastructure as development continues. I need a solution with low maintenance and upkeep, which this provides by iterating over the information_schema.columns and pg_constraints sets. — Kwahn, Nov 29 '18 at 17:19
In that case I'd definitely recommend using JSON. You don't need to worry about what kind of characters they hold, PG's TO_JSON and similar functions will escape any necessary characters correctly. — 404, Nov 29 '18 at 17:21
I see what you mean! Looked at the documentation (https://www.postgresql.org/docs/9.3/functions-json.html), to_json escapes whatever needs escaping and stores escaped text representations of whatever I put in there. Then when undoing, I just recall the JSON, un-escape what's escaped and perform actions on it? I found https://stackoverflow.com/questions/27215216/postgres-how-to-convert-json-string-to-text to undo it - might work. I'll experiment some, thanks! — Kwahn, Nov 29 '18 at 17:23

score 0 · Answer 1 · answered Nov 29 '18 at 17:24

0

The simple answer is to create a line of CSV from the data, like this:

datum1,"datum,with,commas","datum with ""double quote""",,42

Just escape everything that's nasty.

answered Nov 29 '18 at 17:24

Laurenz Albe

209,280
17
206
263

Sidestepping possible delimiter collisions

1 Answers1