I have a very large data set that needs a database, and I am using Windows10 PowerShell as part of my development environment for the CLI and shell interface with both MongoDB and Python;
I have chosen MongoDB as the best solution for the needs of this data and future analysis with Python;
Within the data, I have Python tuples to contain units of 3 numbers with each separated by a comma - (x, y, z)
- and therefore it was necessary to choose semicolons ;
as the delimiter for the CSV file; Furthermore, MongoDB also has issues with parentheses, so I refactored to be Python Lists/Arrays [x, y, z]
; However, MongoDB apparently has an issue with this, so I simplified the data to be a simple text string representation "[x, y, z]"
in order to first isolate the issues described here below:
Evidently, MongoDB has an issue with importing CSV files with semicolons as delimiters, so I have had to recalculate the data to a new CSV file using tabs "\t"
as the delimiter;
Using mongosh
shell commands below (together with the status code messages), I am able to successfully mongoimport
the CSV file to create the database with the collection;
./mongoimport --type tsv -d testdbcsv -c testcollection --headerline --drop dataCSV.csv
PS C:\mongodbdbtools\bin> ./mongoimport --type tsv -d testdbcsv -c testcollection --headerline --drop dataCSV.csv
2022-01-10T11:24:01.487+0200 connected to: mongodb://localhost/
2022-01-10T11:24:01.535+0200 dropping: testdbcsv.testcollection
2022-01-10T11:24:01.611+0200 30 document(s) imported successfully. 0 document(s) failed to import.
PS C:\mongodbdbtools\bin>
For the challenge - and to test and learn new aspects of MongoDB - I also created a JSON file of the exact same data in the CSV in order to see and test if the two MongoDB databases that I create from two separate files (with the same data) would produce identical MongoDB databases;
I have already tested and validated the JSONObjectForExport
as a valid JSON file which is a JSON text string containing a List/Array of Dictionaries/Objects;
However, using the two Windows PowerShell commands to mongoimport
the JSON file, I am getting the following messages depending upon if I add the --jsonArray
argument;
./mongoimport -d testdbjson -c testcollection --drop dataJSON.json --jsonArray
PS C:\mongodbdbtools\bin> ./mongoimport -d testdbjson -c testcollection --drop dataJSON.json --jsonArray
2022-01-10T11:25:05.513+0200 connected to: mongodb://localhost/
2022-01-10T11:25:05.562+0200 dropping: testdbjson.testcollection
2022-01-10T11:25:05.563+0200 Failed: error reading separator after document #1: bad JSON array format - found no opening bracket '[' in input source
2022-01-10T11:25:05.563+0200 0 document(s) imported successfully. 0 document(s) failed to import.
PS C:\mongodbdbtools\bin>
./mongoimport -d testdbjson -c testcollection --drop dataJSON.json
PS C:\mongodbdbtools\bin> ./mongoimport -d testdbjson -c testcollection --drop dataJSON.json
2022-01-10T11:25:24.816+0200 connected to: mongodb://localhost/
2022-01-10T11:25:24.864+0200 dropping: testdbjson.testcollection
2022-01-10T11:25:24.867+0200 Failed: cannot decode string into a primitive.D
2022-01-10T11:25:24.867+0200 0 document(s) imported successfully. 0 document(s) failed to import.
PS C:\mongodbdbtools\bin>
Here is an example of a simplified JSON file with the JSON text string format:
"[{\"KeyField1\": 0, \"KeyField2\": \"[0, 1, 2]\"}]"
So the CSV file imports fine; the JSON file does not import even though it is valid JSON of the exact same data that is being written to both CSV and JSON files;
What could be the issue(s) here?
Thanks for the help;
EDIT: More details of the Python code that produces the JSON File per request to clarify:
## CONVERT/SERIALIZE DICTIONARY TO JSON:
JSONObjectForExport = json.dumps(ListOfDictionaries)
## BEGIN VALIDATE JSON
## BEGIN DEFINE FUNCTION
def validateJSON(jsonData):
try:
json.loads(jsonData)
except ValueError as err:
return(False)
return(True)
## END DEFINE FUNCTION
## CALL FUNCTION
isValid = validateJSON(JSONObjectForExport)
## TEST PRINT OUTPUT
print("Given JSON string is Valid", isValid)
## END VALIDATE JSON
## WRITE DATA AS JSON FILE -
with open("dataJSON.json", "w") as f_json:
json.dump(JSONObjectForExport, f_json)
EDIT: Summary of facts and issues:
1.) Python calculates data and writes the same data to valid CSV and valid JSON format files.
2.) MongoDB successfully imports data from CSV to create db.
3.) MongoDB fails in two ways with two different error messages because neither code works to import same data in JSON format.