3

I have about 100K set of tables that are either 2,3 or 4 columns. Each table has about 5-10 rows of data. But each column has its own unique column name and data type. here are 6 examples out of 100K tables column name with about 10 line.

Property Name | Default Type | Description | Reference

Type | Property | Description | Reference

Code | Description | Reference

Value | Reference

Type | Description | Reference

Number | Description | Reference

At this time everything is stored in 100K html file but i want to store them in DB. I was wondering if any one has any suggestion about how to store them with minimum possible tables.

the solution that that was thinking are :

  1. Creating about 100K tables ( Last solution )
  2. have three set of tables, 4+1col, 6+1col and 8+1col and use the odd column for column title and even column for the value. Each row has referenced to a foreign key that specify where is this data for. A query them bashed column name and a foreign key can retrieve my data .

I was wondering if anyone has better solution to this ?

Shnugo
  • 66,100
  • 9
  • 53
  • 114
user996554
  • 105
  • 9
  • 3
    the set of columns are really unique across 100k files? either way, this data is not very structured so storing it in a 100k rdbms tables would be quite insane. – pvg Dec 13 '15 at 22:33
  • 2
    Some further explanation of the data you're storing with some examples of 'similar' table structures would go a long way to determining if you can reduce the number of tables. – Alex.Ritna Dec 13 '15 at 22:37
  • @pvg Basically each customer defines their table columns based on provided data. some customer have to provide port number , some other hex value and some others description or many other kinds – user996554 Dec 13 '15 at 22:47
  • 1
    What is the total number of distinct column names, have you looked? I mean, if the data is completely unstructured, what possible useful thing can you do with it if it happens to go into a db? Need to tell us more about what structure there is – pvg Dec 13 '15 at 22:51
  • @pvg almost 2 out of 3 column are distinct. At this time everything is hard coded in html file to adjust a value in a column some has to update a html file and deploy almost everything to production By having them in Database updating, and searching will be easier. I need to mentioned that this Django website – user996554 Dec 13 '15 at 23:13
  • are you saying you have 60k distinct column names or somesuch? – pvg Dec 13 '15 at 23:14
  • Thats what why i was thinking to do odd/even column with column name on odd column .... But this is a very strange solution – user996554 Dec 13 '15 at 23:19
  • that is correct .. about 60K different distinct columns name – user996554 Dec 13 '15 at 23:20
  • 1
    Could you please show some of your tables with data? I cannot imagin, that this approach is the best solution... – Shnugo Dec 13 '15 at 23:32
  • Multiple customers are defining their own tables and columns in a single database? – Mike Sherrill 'Cat Recall' Dec 13 '15 at 23:45
  • Tag Name | Reference | Status | Description ==> (aspf | RFC7489 | curent | SPF alignment) ----------------------------------------------------------- Common Name | URI | Reference ==> ( DSKPP-PRE-AES | urn:ietf:params:xml:ns:keyprov:dskpp:prf-aes-128 | RFC6360 ) ----------------------------------------------------------- Value | Switching Granularity | Reference ==> (255 | Reserver | RFC6003) – user996554 Dec 14 '15 at 00:17
  • These are sample data – user996554 Dec 14 '15 at 00:36
  • at this time, our customers are sending us an email stating their change request. after applying those changes so some servers or systems, we have to modify a HTML page for that customer with the latest changes to track it . – user996554 Dec 14 '15 at 00:39
  • 1
    Really, I cannot imagine how such a system could ever run error free. You might think about using XML data within SQL Server. But without knowing your processes in detail it is impossible to give you better advise. – Shnugo Dec 14 '15 at 01:48
  • 2
    You probably want to group these together into a smaller number of tables. At a certain point any database administration tool will just melt and die if you connect to a server with that many tables. Think about proper data normalization, also look at more schemaless options like Postgres and JSON data columns. – tadman Dec 14 '15 at 04:07

2 Answers2

1

100K tables is both bad design and potentially bad performance. Since each table is implemented using 1-3 files, you are depending on the efficiency of the OS for file lookup. Also, MySQL table caches are not designed for huge numbers like that.

MySQL, on the other hand, is quite happy to handle a million-row table. Or a small number of large tables.

What distinguishes one CSV file from another? Could that be another 'column' in the table it goes into? And then have all the files that have the same structure (eg, Value + Ref) go into a table with those columns plus one more?

Consider doing the loading in this manner:

  1. CREATE TABLE
  2. LOAD DATA to bring one file into that table
  3. Manipulate it (if necessary) in SQL and copy it into the ultimate table (INSERT ... SELECT ...)

Then repeat for each file.

With 100K files, you probably want to write a program that creates the code to do the 100K steps.

Rick James
  • 135,179
  • 13
  • 127
  • 222
0

You can convert html files into CSV as stated here and import data into mysql database.

Community
  • 1
  • 1
Muhammad Muazzam
  • 2,810
  • 6
  • 33
  • 62