0

the NOOB Developer is back with yet another question. I'm working on importing a large dataset into a PostgreSQL 9.5 database. I originally started out using PHP, but once I tried to load the entire 14mb file, it failed. I went on to increase the memory limit within the script, but that didn't help. I thought about using a parsing library, but decided that since I'm using PostgreSQL 9.5, I should just leverage the database instead. My JSON file has repeatable fields, so I could not use JSONB and went with using the plain JSON import. Unfortunately, that worked until I tried to load the entire file. I then got the following error:

ERROR:  invalid input syntax for type json
DETAIL:  Expected JSON value, but found "]".
CONTEXT:  JSON data, line 1: ...:"Commercial & Industrial","u_cost_code":""},"",]

Here is an example of the JSON file content:

Array
(
    [result] => Array
        (
            [0] => Array
                (
                    [field1] => 0
                    [fiedl2] => 
                    [field3] => 1900-04-19 19:14:10
                    [field4] => false
                    [field5] => XXX.XXXXX.XXX.XXX.XXX
                    [field6] => ldap:CN=XXXX XXXXXXX,OU=XXXXX,OU=XXXXX,OU=XXX,DC=XXXXXX,DC=XXXX,DC=XXXX
                    [field7] => 1900-07-18 17:45:08
                    [field8] => true
                    [field9] => 
                    [field10] => false
                    [field11] => 2
                    [field12] => 30406
                    [field13] => T
                    [field14] => 00000000000000000
                    [field15] => 1900-01-19 21:33:07
                    [field16] => Array
                        (
                            [link] => https://mozilla.com
                            [value] => mozilla
                        )

                    [field17] => 1601-01-01 06:00:00
                    [field18] => 
                    [field19] => false
                    [field20] => 01001
                    [field21] => 

                )           
        )
)

Here is the statement I'm using to create my table, this allowed me to import the entire file 14mb without an issue:

CREATE TABLE temp_json 
(
     ID SERIAL NOT NULL PRIMARY KEY
    ,TIMESTAMP TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    ,VALUES TEXT
);

I started following the example of this developer hoping to resolve this issue: how-to-get-json-data-type-into-postgresql

Here is the fairly standard copy command I'm using to import the data into the table:

copy temp_json(values) from 'C:\path\to\my\json_file.json';

I then went on to use the following sql statement in an attempt to move the data into a relational table that I found here, loading-json-data-from-a-file-into-postgres, on stack. I did this in the effort of finding an easier way to move my data set into the table. Here is the sql statement I am trying to get working:

insert into table_to_hold_json 
select 
values::json->'result'->'calendar_integration' as calendar_integration,
values::json->'result'->'country' as country,
values::json->'result'->'last_login_time' as last_login_time,
values::json->'result'->'u_gartner_acct' as u_gartner_acct,
values::json->'result'->'u_dept_name' as u_dept_name,
values::json->'result'->'source' as source,
values::json->'result'->'sys_updated_on' as sys_updated_on,
values::json->'result'->'u_field_user' as u_field_user
from ( select json_array_elements(replace(values,'\','\\')::json) as values 
from temp_json ) a;

However, I'm now getting the same error as I did on the import to the temp_json table. I also tried to escape the '\' with the copy command using:

csv quote e'\x01' delimiter e'\x02'

Unfortunately, I still end up with the same error when I try to query the JSON data. So, now I'm banging my head against the wall in trying to sort out how to escape that darn ']'. Any assistance that is given will be greatly appreciated!

Tastedalgory
  • 33
  • 1
  • 8
  • I don't think the issue is really the array bracket, `,]` its the comma before it `,null]` especially if that bracket has an open bracket `["value", ... ,]` As one would parse it the `,` would trigger the parser to expect a string next, but instead it encounters the `]` `Expected JSON value, but found "]"` To me that doesn't mean the bracket is bad, but that a value is missing. Just my 2 cents. – ArtisticPhoenix Aug 05 '17 at 02:45
  • Just gave me the thought of writing a Json parser in PHP, lol, I just wrote a string Lexer\Parser ( mainly for HTML ) but for other things as well. A quick hackish solution would be to remove the comma when it appears next to the `,]` and replace it with either `,""]` or just `]` – ArtisticPhoenix Aug 05 '17 at 02:51
  • @ArtisticPhoenix Thanks for replying to my question. I did not think of that as part of the issue. I'm currently trying sort out how to accomplish that, since I cannot change how the file is being formed. I will have to hand it on my end. – Tastedalgory Aug 05 '17 at 03:20
  • That was why I didn't submit an answer, unless you can change the file, or the way it is processed it will be hard. You could try to edit the file before hand as `,]` is no so common that it would be part of the data. – ArtisticPhoenix Aug 05 '17 at 03:23
  • Personally if it was me, I would build a small test file with that error in it, something with only a few lines, `{ .. data .. [ ..data.. ,]}` try to parse it, to see if that is the issue, then see if simply searching and replacing it would fix the invalid JSON. Pre-Processioning the file is always an option, not the best for performance, but it could work, especially on a one time deal type of issue. Then performance is not so much a concern. – ArtisticPhoenix Aug 05 '17 at 03:25
  • in PHP it could be a simple as something like this `$file = str_replace(',}', '}', $file);` basically read the file beforehand and remove that `,` the only issue would be if the string appears in the data somewhere `,]` but that is somewhat unlikely, although not impossible. It's hackish to be sure, but at the end of the day if it works. Just do it on a copy :-p – ArtisticPhoenix Aug 05 '17 at 03:30
  • @ArtisticPhoenix That's not a bad idea. I started working on this using a small data set of 100 records. Then I started using it on a data set of 1000 records, no problem there either. It's only when I try to work with the entire data set that this issue comes up. So it is definitely, buried in the data set somewhere. Also, this going to be an ongoing process, so I have to streamline along the way. – Tastedalgory Aug 05 '17 at 03:32
  • The best bet, performance wise, would be to edit the parser, to account for the null value. – ArtisticPhoenix Aug 05 '17 at 03:34
  • I could write a PHP one ( in a few days, maybe ... ) using my new and shiny Lexer, https://github.com/ArtisticPhoenix/Evo/tree/master/Evo/Core/Plexer – ArtisticPhoenix Aug 05 '17 at 03:35
  • of particular interest in that would be the mapping class, https://github.com/ArtisticPhoenix/Evo/blob/master/Evo/Core/Plexer/Mapper/BasicMapper.php And the Lexer here https://github.com/ArtisticPhoenix/Evo/blob/master/Evo/Core/Plexer/Lexer/BasicLexer.php – ArtisticPhoenix Aug 05 '17 at 03:37
  • Here are 2 extensions I am working on that use that Events https://github.com/ArtisticPhoenix/Evo/tree/master/Evo/Core/Commander/EventPlexer And HTML ( which is probably broken right now ) https://github.com/ArtisticPhoenix/Evo/tree/master/Evo/App/Plugin/HtmlPlexer ( these are part of a larger framework ) my point is it's possible to write your own Parser for this with some work, obviously making a small change to an existing one would be faster, or just replacing the string, but then you are processing the file extra. – ArtisticPhoenix Aug 05 '17 at 03:49
  • @ArtisticPhoenix So I started working on creating the sample set with the error in it. When I got to the end of the rather large file, I see this "",],"error":{"detail":"Transaction cancelled: maximum execution time exceeded","message":"Transaction cancelled: maximum execution time exceeded"},"status":"failure"}. I'm now working on breaking up my record request into multiple files under the execution time limit. I feel like a complete NOOB! Thank you for taking a look! – Tastedalgory Aug 05 '17 at 11:15
  • That makes sense, as that would make the file cut off and the Json of coarse invalid. – ArtisticPhoenix Aug 05 '17 at 19:34

1 Answers1

0

Okay, so I went back and worked out how to break up my file download from the data provider. Now that I'm keeping the data set under the specified timeout period I can use PHP or whatever else I want to parse the data. This is a good reminder to always check your logs or data sets, closely. :-)

Tastedalgory
  • 33
  • 1
  • 8