I'm working on an application that has the following use case:
- Users upload csv files, which need to be persisted across application restarts
- The data in the csv files need to be queried/sorted etc
- Users specify the query-able columns in a csv file at the time of uploading the file
The currently proposed solution is:
- For small files (much more common), transform the data into xml and store it either as a LOB or in the file system. For querying, slurp the whole data into memory and use something like XQuery
- For larger files, create dynamic tables in the database (MySQL), with indexes on the query-able columns
Although we have prototyped this solution and it works reasonably well, it's keeping us from supporting more complex file formats such as XML and JSON. There are also a few more niggling issues with the solution that I won't go into.
Considering the schemaless nature of NoSQL databases, I though they might be used to solve this problem. I have no practical experience with NoSQL though. My questions are:
- Is NoSQL well suited for this use case?
- If so, which NoSQL database?
- How would we store csv files in the DB (collection of key-value pairs where the column headers make up the keys and the data fields from each row make up the values?)
- How would we store XML/JSON files with possibly deeply hierarchical structures?
- How about querying/indexing and other performance considerations? How does that compare to something like MySQL?
Appreciate the responses and thanks in advance!
example csv file:
employee_id,name,address
1234,XXXX,abcabc
001001,YYY,xyzxyz
...
DDL statement:
CREATE TABLE `employees`(
`id` INT(6) NOT NULL AUTO_INCREMENT,
`employee_id` VARCHAR(12) NOT NULL,
`name` VARCHAR(255),
`address` TEXT,
PRIMARY KEY (`id`),
UNIQUE INDEX `EMPLOYEE_ID` (`employee_id`)
);
for each row in csv file
INSERT INTO `employees`
(`employee_id`,
`name`,
`address`)
VALUES (...);