5

There's a very similar question: Modeling products with vastly different sets of needed-to-know information and linking them to lineitems? But I can't find an answer that help me;

Someone at the above Q&A points to designing database to hold different metadata information , which has a fantastic accepted answer, but since search function is explicitly needed in my program, I don't want performance to be compromised.


I'm a "technician" that uses PHP + Oracle to keep track of the selling progress of our company and generate reports. Our workflow generally looks like this:

  1. Marketing guys provides prepared data-set to my system;
  2. Frontline staffs (sales) mark progress on my system;
    • Anyone can search results in the system;
  3. I generate reports back to marketing guys.

The problem:

Many columns of data-sets are the same (or can be considered the same), like these:

account|customer_name|gender|location|program_segment|...

But the marketing dept. like coming up new ideas (and abandoning existing ones), so each "sales program (campaign)" may has its own data, e.g.

For program 1, they may contain:

...|prev_coupon_code|last_usage_amount|...

For program 2, however, they may contain:

...|is_in_plan_1|is_in_plan_2|...

You got the idea.

Unsuccessful attempts:

  • In order to hold all data, I used to use a "long enough" table that has all possible properties (columns), and leave blank/unnecessary properties NULL.

    But now I feel that it will never be "long enough", as there're too many "properties" and even more "sales focusing point": I drafted a 41 column table for a new version of the system and suddenly they proposed a new program that has information that can't fit.

  • Someone suggested me to create "dummy columns" in the table and "remember" different meaning of them in frontend. This can work for several datatypes, like NUMBER(1) for Y/N, DATE, etc., but when talking about VARCHAR2, I'm not sure how many of that is enough...plus this makes the table look "dirty".

Question:

Frustrated, I'm now seriously considering using different tables for different programs, and use UNION clause to generate big report in case they're asked "how are we selling this month/season/year?"

Technically, is this a good practice? Should I implement it?


Edit #1:

To clarify, one "sales program" will generally be running for a few months before it got abandoned, and there'll be at least one data-set per month for each running program.

And there can be more than one program running at the same time.

Edit #2:

Those "program-specified" columns are of various number: one program may need 10, while another may only need 1.

Community
  • 1
  • 1
Passerby
  • 9,715
  • 2
  • 33
  • 50
  • is their a need for the data to be up to date (last minute) when you make the report or could it be ok if the data is updated once or twice a day for creating your reports? same question for the search function – Sebas Feb 28 '13 at 12:52

4 Answers4

2

This is one of those situations where there is no right answer, just a choice of kludges.

I would plump for using an XMLType to hold the transient data structures. XML gives us the ability to have defined schemas for each plan, but using an XMLType obviates the need to change the database itself. We can index XPath queries so the performance can still be good. Find out more.

The one problem is that writing queries against XML is a bit of a pfaff, but I think awkward queries will be an issue for whichever apporach you take.

APC
  • 144,005
  • 19
  • 170
  • 281
  • This is (kind of) what I was driving at. Add in free-text indexing and the users can decide themselves (within reason) what their reporting criteria are. – Hugh Jones Feb 28 '13 at 12:47
1

You may or may not be aware that it is possible to index the contents of a character LOB in Oracle. You might look up Oracle Intermedia / multimedia (depends on your version) and talk to your DBAs to see if it is available to you.

This would make it possible to create a common structure for common data items - eg campaign, start_date, end_date, &c but then to dump your spreadsheet/xml data/csv file into a CLOB field.

The plain-text indexing is not as hard as it first sounds and it is very cute indeed.

Hugh Jones
  • 2,706
  • 19
  • 30
  • This may not work as the "unexpected extra columns" are number-undetermined. I'll update my question with this info. Anyway, will this work if the "extra columns" are numeric/datetime? – Passerby Feb 28 '13 at 09:56
  • It can be made to work, yes. It somewhat depends on how you are currently capturing data - for example, if you stored the data as xml then a datetime would have a known text-format. Xml would be a good choice because you have the field names and the values alongside each other. – Hugh Jones Feb 28 '13 at 12:42
  • +1 Cool to know. I'm now considering this (and XML) approach. Sometimes I hate marketing... – Passerby Mar 01 '13 at 03:08
0

If you go down the different table path you will forever be changing code to meet the changing columns etc.

One option would be to have 2 additional columns 'campaign_name', 'campaign_value' and put the column name they send you in the NAME column and the value in the value column.

So,

account|customer_name|.....|campaign_name|campaign_value
'ACC001'|'Frank Burns'|........|'prev_coupon_code'|[value of prev_coupon_code

and then in your 2nd example:

account|customer_name|.....|campaign_name|campaign_value
'ACC001'|'Frank Burns'|........|'is_in_plan_1'|[value of is_in_plan_1

Update - yes, this would involve changing the grain of the table so you would add a set of data for each of the campaigns. The import would be a little different in that you'd UNION the records for each of the column names that appear on there, and the reporting would need to take into account the grain change.

It sounds like a complete waste of space, but if these are Excel sheets then performance shouldn't matter. If it did you would need to split the tables into - campaigns, accounts, accounts_campaigns

acutesoftware
  • 1,091
  • 3
  • 14
  • 33
  • What about the `last_usage_amount` in my example 1, and `is_in_plan_2` in my example 2? Are you suggesting multiple row for a same record? – Passerby Feb 28 '13 at 09:17
  • The problem of multi-row-for-single-record (as discussed in the first SO link I posted), is that I need to use `VARCHAR2` for your `campaign_value`, no matter what datatype nature it would be. And it's not easy to decide how long `campaign_value` should define. – Passerby Feb 28 '13 at 09:36
  • Then if needed you can have a third column called campaign_datatype which you would use to decide how to report / import the data. You would still store the data as a VARCHAR2, but use the new column to import / report. How many records are we talking here (avg / max). If it is fairly small then just make the column length 255 – acutesoftware Feb 28 '13 at 09:40
  • Can you demonstrate how to define your `campaign_name`,`campaign_value`,`campaign_datatype` columns? – Passerby Feb 28 '13 at 09:46
  • campaign_name varchar(40), campaign_value varchar(255), campaign_dtype NUMBER, -- 1=varchar, 2=number, 3=date, 4=datetime, 5=time, .... – acutesoftware Feb 28 '13 at 09:49
  • This will actually involves two tables (one for definition `program_id|column_name|column_type` for each program, one for value `record_id|column_name|column_value` for each record), which is both large in space, and hard in output, as it shifts the relationship from 1D to 2D. – Passerby Feb 28 '13 at 10:06
0

On my current job, I successfully use following system for 2 years.

You have one main table, let's say 'report', that consist of common columns for all kind of reports.

id - primary, auto_increment.

name - name of the report.

Then, for each specific report, you have another table, called something like "report_marketing". There you have report_id column, that is foreign key to first main table. And here you add all specific columns for this specific report.

To get results, you simply use LEFT JOIN.

If some reports share some columns from 2 or more tables, you can always join more than one column.

Here is example of query you might have:

SELECT report.name, report_marketing.ammount FROM report WHERE report.type = 'M'
  LEFT JOIN report_marketing ON report_marketing.report_id = report.id;
Serge Kuharev
  • 1,052
  • 6
  • 16