-1

we have some datasets with more than 5000 columns which are not supported in Microsoft SQL Server. Data files are medical/genetics and cannot be de-normalized. We need to join tables very frequently to answer queries.

I did a research and figured out Google big table allows up to 10,000 columns but we are not allowed to upload datasets on cloud and they must be used locally.

Also, the backend application is written in C# (WPF) and we are going to continue using it ... I was wondering, is there any DBMS out there that supports wide tables, joins, and has an API for .NET?

As an alternative solution we can storeeach table as a document on MongoDB and handle joins on the application side, but I am not sure if it will perform well.

We are building a data warehouse to help COPD researchers. It's completely non-profit :)

Thanks so much for your help in advance :)

Shane
  • 128
  • 1
  • 3
  • 15
  • You may want to investigate this "...your total fixed and variable length data are still limited to 8019 bytes..." instead of just the ability to have the number of columns you seek. I found this caveat for both postgresql and sql server (sql server has "wide table" that can support up to 30k columns, but most would need to be null for a given row to keep you under the 8019 byte limit on data). To be honest, this doesn't sound very feasible anyway. Even if you got the data into these massively wide tables, the indexes needed to support your queries would be massive and inefficient. – user7396598 Feb 02 '18 at 21:39
  • thanks for your comment. I considered using wide tables in SQL Server but as you said most values must be null in that case so the total row size stays under 8019 bytes. However, in our case there are no null values and it wouldn't work ... So, I was thinking maybe there are other ways, like NoSQL databases, but the issue with those is that most of of them don't support join operations at all ... – Shane Feb 02 '18 at 22:02
  • If you need joins and query results, for building reports and such, you want a SQL db. Unfortunately none of the 4 big players support these large numbers of columns. I know you said you couldn't "de-normalize" the data, but I have some questions that might be worth considering. How many sources of info do you have? Do all the sources provide it in the same format to you? Can you logically split the raw data into multiple data tables? – user7396598 Feb 02 '18 at 22:12
  • Thanks for your comment @user7396598 . It it possible to split datasets (CSV files) into smaller files. However, the problem is that some future datasets are going to have more than 10,000 columns. So, we thought the performance and maintenance may become an issue in the future ... especially because we are doing it for free and will stop supporting it after a while; so, would like to design something that let researchers keep adding their datasets without facing issues at least in short-term ... – Shane Feb 05 '18 at 03:46

1 Answers1

0

This is a problem which lies beyond all the planned scenarios for all the DBMSs I know, so I have to resort to a hack. I suspect has it problems, but there's nothing like posting on stack overflow to learn what they are. :-)

Create a Key-Value table in your SQL Server. Create a doc id for each record; it'll probably be a GUID. So the table will be (docid, columnid, value). They key for your table will be (docid, columnid). Add an index on (columnid, value). Then, you can use table joins and support unlimited columnids.

I found a discussion of the pros and cons of a key-value table here: Key value pairs in relational database

Jeffrey Rennie
  • 3,193
  • 1
  • 18
  • 19