1

I have to upload the csv file data into the datatbase where the row headers will be the column names and they could vary per CSV file but with some mandatory headers. I would like know how to design the database structure for this kind of a requirement. I am thinking of creating the columns dynamically but i have a doubt what if the next uploaded file havent had the same row headers.

Please provide some tips on how to proceed on this. Thaks

Vivekh
  • 4,141
  • 11
  • 57
  • 102
  • Would there be any columns which would definitely be part of the CSV? – bit Apr 09 '15 at 06:04
  • Yes there are 6 of them – Vivekh Apr 09 '15 at 06:07
  • Then you may create a table in your DB for those 6 (constant) columns, and rest of the columns could be created (as Nullable) as and when you encounter them. – bit Apr 09 '15 at 06:09
  • yes but We need to add proper names to those columns also which will be known only at the time the user uploads the file – Vivekh Apr 09 '15 at 06:11
  • You need to decide whether you will be creating a new column for every new header name that you come across, or whether you need to map them appropriately.. – bit Apr 09 '15 at 06:12
  • 1
    Not sure why someone votes this down to -1. Seems a bit harsh to me since it is a valid question. I have upvoted it. – Peter Lange Apr 09 '15 at 06:30

2 Answers2

1

I think your problem, to be honest, is one of architecture. Let me sum up what I believe the problem you are stating is. You have users that will upload CSV files containing data, the first row of which is the "column header" or label for that data. You need to be able to store that data in your database without knowing for sure what those column headers will be called. And, to further complicate things, the next consecutive CSV file may have completely different column headers.

There is no solution where this will fit into a neat, easy to package table structure. Even if you have columns called "User Defined 1", "User Defined 2", etc, it would be extremely difficult and fraught to always map files to those fields correctly.

Instead you should explore three alternative options:

1) Store the data in a non-linear structure. Have two tables. One will hold the static columns you know will never change, as well as primary key. The Second table would have a 1 to many relationship with the first table and each row in the second table would represent a new data element associated with the first tables. Rows.

2) Store the data in a NoSQL database. I am not really super familiar with these myself, but it is my understanding that they are document based and can hold documents with varying object graphs, and provide mechanisms for querying those.

3) In case you cant switch to a NoSQL database, and assuming you are on an SQL Server, store the data as XML. You can leverage SQL Server's built in XML parsing functions to still query the data, but since it is all in a single row you don't need to worry about column header mappings.

Final thing to note is, in the scenario you provide, there probably isn't a solution that will not extract some cost to performance. Structure frequently exists for a reason, and while that doesn't mean we can abandon structured data, that does mean we have to accept that cost.

Peter Lange
  • 2,786
  • 2
  • 26
  • 40
0

I have done this exact same thing. Our requirement was an application that would accept data dumps from multiple companies. The data had to contain a handful of known columns (although even the name of these columns could differ from company to company), but the remaining non-mandatory columns could vary in number.

We chose to have a "primary" table to hold the mandatory information and for the remaining columns we went for an Entity-Attribute-Value (EAV) type table. I know this structure is generally frowned upon, but it was the solution we chose and while not perfect, it worked OK for us.

We could essentially reconstruct the data using pivot queries with dynamic columns. I'll admit that this is slow when the tables get huge, most of the queries requried were on the "primary" table, and the attributes were only pulled out for records of interest, so it worked for us.

Here is another post I had regarding this. It referred to some other SO posts that might provide some useful insights.

Community
  • 1
  • 1
Mr Moose
  • 5,946
  • 7
  • 34
  • 69