1

I am working on a health app and I'm designing a database (MySQL), which should store symptoms and diseases and bodyparts. The app will work as follows:

  1. User chooses gender, age and bodypart (front/back)
  2. App shows all (common/less common) symptoms of chosen bodypart
  3. User chooses symptom
  4. App shows all (common/less common) diseases of chosen symptom.
  5. User chooses disease
  6. App shows disease information

I got some help so far, but I still need some help finding a solution:

  • Making weight in symptom_disease dependent on the selected age and gender (order of the listed diseases should depend on the selected age and gender).
  • Some symptoms should have additional symptoms, that the user can choose as an extra. So for example, when the user chooses 'Head' -> 'Behavioral disturbances' (common), the app should display 2 extra checkboxes 'Depressed' and 'Drugs abuse'. The order of diseases list should depend on these inputs.

Note: The weight determines if a symptom/disease is common or less common. The common symptoms/diseases are listed above the less common symptoms/diseases.

Database design

Engo
  • 899
  • 3
  • 19
  • 49
  • 'd_threatment' should be 'd_treatment' – Engo May 21 '15 at 12:44
  • The adjustments can be done in your queries or as a function since it can be a complicated calculation. Or you can store them as extra columns or as separate mapping table but it will make queries more complicated and slow. – Tim3880 May 21 '15 at 12:51
  • What would be the best way to do such things? – Engo May 21 '15 at 12:57
  • I would define a small function taking three parameters, age, gender,weight and returning the adjusted weight. If your adjustment is very simple, you might be able to use Case directly in your query, but maintenance will be difficult. – Tim3880 May 21 '15 at 13:11
  • It would be hard since not all diseases will have the same calculation. There are only 4 age categories. I think it should be better to store it somehow in the database... – Engo May 21 '15 at 13:16
  • You can pre-calculate them and store in an extra table. Using function can actually hide those details so you don't have to change your queries later. – Tim3880 May 21 '15 at 13:22
  • How would my database look like if I want to store it? The weight should depend on age, gender and the chosen additional symptoms. – Engo May 21 '15 at 13:25
  • If you need also refer to any other symptoms, you need a function. It will be helpful if you post your algorithm to calculate the weight based on given age, gender, symptoms. – Tim3880 May 21 '15 at 14:16
  • I don't have an algorithm, I just have all combinations possible. – Engo May 21 '15 at 15:30
  • can you show us some sample data? – Tim3880 May 21 '15 at 15:57
  • Here, the first 2 records of the database: http://i.imgur.com/WYECk5Q.png. The database is not efficient and has 180000 rows. – Engo May 21 '15 at 16:34
  • i think you should store the weight in a separate table for per disease, symptom, age group, gender. The possible diseases should be displayed in the order of total weight of all matched symptoms. – Tim3880 May 21 '15 at 19:03
  • 1
    I can address this on the weekend, please stand by. – PerformanceDBA May 22 '15 at 05:56
  • @PerformanceDBA Please don't forget to take a look at my question! :) – Engo May 23 '15 at 23:11
  • 1
    Of course not. How could I forget an earnest guy like you. Besides, I always keep my word. – PerformanceDBA May 24 '15 at 02:44

2 Answers2

2

The weight determines if a symptom/disease is common or less common. The common symptoms/diseases are listed above the less common symptoms/diseases.

That is fine. A bit simplistic, but fine. That is ranking by weight or ORDER BY weight.

Making weight in symptom_disease dependent on the selected age and gender (order of the listed diseases should depend on the selected age and gender)

That tells me the result you want (which is also fine). But the data that is required to produce that result ...

  • weight per disease per symptom per age, or

  • weight per disease per symptom per age per gender

... is absent. You cannot produce something from nothing.

Concept

The whole concept is immature. (I am, of course, speaking about the party who gave you this job, not you, who is trying to implement it.) Eg. we know that:

  • white people who live in sun-drenched countries suffer melanoma, etc, that the locals don't

  • south-east Asians who live in Western society have more diabetes and heart disease than they do in their own countries

  • many diseases, eg. breast cancer, have a hereditary factor

The point is, the concept is very simplistic. When that simplistic approach is thought through carefully, hopefully by medically-trained and experienced professionals, they will come up with something that is much more indicative than a symptom_disease.weight, which then has to be modulated up or down based on age and gender. They will come up with many factors that influence weight, age and gender being just two such factors.

After the system is implemented, if not before, you are going to need:

  • many more factors or categories

  • that must each have either a weight value or a factor value

  • which (a) the user chooses, and (b) influences the end result (probability of the disease)

Further, during such resolution:

  • some of those factors might turn out to be a weight (another weight, at that factor level) rather than a factor

  • during that process, it may turn out that the symptom_disease.weight is actually an intermediate result, not a stored value at the symptom_disease level

  • that might lead to weight at the disease level, that is modulated by each factor.

Result Weight

Once the data is structured and organised, obtaining resulting weight based on the user's choice of symptoms, categories, and factors is easy. It is just a single SELECT against the database.

  • I would use Dynamic SQL to construct the query, and then execute it, because the alternative is to code every possible query, and select the one that is relevant to the user request.

  • However, for a starting point, for proof of concept, just code a few separate queries.

You don't need a function on the database side. On the code side, a function is a good idea, because it eliminates duplicate code.

Record ID

This creature poses many problems, and guarantees a non-database. Please read this Answer. You may find the whole post useful, if not, the minimum you need is from the top, up to and including False Teachers.

Likewise, prefixing the column names with a file identifier is an error. In SQL, you can qualify a column name, as and when required, with the file name. Instead of s_name and bp_name:

    SELECT  symptom.name,
            bodypart.name,
            ...

Data Model

The most important issue at this early stage, is to provide a database that is easy to extend, as per those expected changes, which means, pure Relational and adherence to Standards. You must place all data; rules about data; everything that relates to data, in the database (not in code).

Symptom Disease Data Model (Demonstrative)

In any case, right now, given what you have described, and considering my explanation above, we have two options. I have provided a separate data model for each.

1 Modulated Weight

Your client provides a weight per disease per symptom only, plus factors per age, gender, etc.

  • The factors will be used to modulate the weight.

  • Notice, the age Key is an age range, eg. 1-18; 19-25; 26-35; 35-55; etc.

  • Factor is a REAL between 0 and some reasonable maximum (eg. 100). Eg. 1.0 for age means the weight is unchanged with age; 0.5 means the weight is halved with age; 2.0 means the weight is doubled with age; etc.

  • The vectors or Dimensions are across the top. As and when new factors are identified (eg. Ethnicity), you can add another factor (Dimension), in the same way that age and gender have been added.

2 Specific Weight

Your client provides a specific weight per disease per symptom per age per gender, etc. There are no factors.

Discussion

In both models, when tables are added as and when new factors are identified, the core (five tables, top centre) do not change. However, in the second model, I have differentiated age and gender on this basis:

  1. Gender is modelled as a definitive, a Dimension, and thus I have worked it through the core tables, as the Relational Key. Relational Keys are important for navigation (as in reducing it) and providing great power. Here in addition to that, I am saying gender is definitive.

    The corollary is, you do not want definitive factors to be excluded from the Key of the core tables, because it will reduce navigation ability and speed.

  2. Age is modelled as an additional Dimension, that can be added easily, after the core tables are stable, as an when new factors are identified.

One is not better than the other. The point I am making here is, and this applies to both models, but it is more visible in the Specific Weight model, the Dimensions that comprise the core tables need to be thought out carefully. Not because anything that you work into the Relational Keys are more difficult to change (and has an impact of existing code), but the other way around, that what is genuinely a definitive part of the core tables, that forms the core symptom::disease indicator, is stable. It must be determined and modelled early.

  • Eg. I don't understand why you have symptom as being a child of bodypart, I can imagine symptoms that are not specific to a bodypart, but I will leave that as is for now.

    • The ugly-as-sin fix without changing the structure is to have a special bodypart indicating the whole body.

    • Rather than { Head/Behavioural/...}, think about categories for symptom categories such as { Physical | Mental | Behaviour }. Or Behaviour { Smoking | Alcohol } may be a separate dimension that contributes to disease.

  • Eg. I can imagine symptoms that are specifically male or female only. I have modelled gender two ways, so that you can evaluate the difference, and the impact:

    • Divorced and isolated in 1 Modulated Weight

    • As a classifier (and therefore Identifier) of bodypart, symptom,and disease in 2 Specific Weight).

The point of all that is:

  1. The definitive factors need to be exposed; evaluated; determined, early, such that the core tables are stable.

  2. All factors that are relevant to disease diagnosis must be identified and implemented. Not merely Age and Gender, but Ethnicity; Hereditary diseases; Behaviour; Location; etc.

Please feel free to comment, ask questions, etc.

Community
  • 1
  • 1
PerformanceDBA
  • 32,198
  • 10
  • 64
  • 90
  • 1
    Firstly, thank you for your detailed answer again! Please take a look at the sketch of the application: http://i.imgur.com/p9QP1av.png and the raw data: http://i.imgur.com/BFucG2J.png The database is not designed to be used in an application. Its provided by a medical foundation. As you can see, the order of the diseases in the sketch is equal to the order of the diseases in the database. My goal is to design a database to continue the development of the application like in the sketch. I do this job as a volunteer. I hope this will clarify the question/goal of the application much better.. – Engo May 24 '15 at 11:04
  • 1
    @Engo. (a) My pleasure. In light of your employment status, please be assured that I have similar interests. I would be quite happy to provide db advice, to the end. (b) What reads the db ? (c) If the db is provided, why don't you use it as is ? – PerformanceDBA May 24 '15 at 14:41
  • Thank you for your interest. The database is created by medical doctors and is not intended for this application. I am asking because I believe the database is not really efficient to use for my application (it has 190k of rows). – Engo May 25 '15 at 12:38
1

There is a method you can use to break a spectrum of choices into categories according to spectrum units, in this case age.

create table Symptom_Disease(
    Sex       char( 1 ) not null,
    Age       smallint not null,
    Weight    smallint not null,
    BodyPart  int not null references BodyPart( BP_ID),
    SymptomID int not null references Symptoms( S_ID ),
    DiseaseID int not null references Disease( D_ID ),
    constraint PK_Symptom_Disease primary key( Sex, Age )
);

You have the sex, age, body part and symptom when you query:

select  sd.Sex, sd.Age, d.*, sd.weight
from    Symptom_Disease sd
join    Disease d
    on  d.D_ID = sd.D_ID
where   sd.Sex = :Sex
    and sd.BodyPart = :BodyPart
    and sd.Symptom = :Symptom
    and sd.Age =(
            select  Max( Age )
            from    Symptom_Disease
            where   Sex = sd.Sex
                and BodyPart = sd.BodyPart
                and Symptom = sd.Symptom
                and Age <= :Age )
order by weight;

For example, you might have entries (of various weights) for ages 0, 2, 12, 21, 35 and 70. The ages represent the lower cutoff for the age group. So the first record is for ages 2 or less, the second record is for ages 12 and younger down to but not including 2 and so forth. If your patient's age is 25, this query will return all the diseases of each weight for each set of (Sex, Body Part, Symptom) for age 21 -- it being the highest age less than or equal to the patient's age.

Just change sd.Symptom = :Symptom to sd.Symptom in( :S1, :S2,...) to allow for multiple symptoms. If so, you might want to add symptom to the order by clause.

TommCatt
  • 5,498
  • 1
  • 13
  • 20
  • Thank you for your answer. Please take a look at my comment including the sketch and a screenshot of the raw database. I hope it will explain my question in more detail.. – Engo May 24 '15 at 11:05
  • Does my answer point you toward a solution? If not -- if I've missed something or misunderstood something -- could you please just state how it is lacking? – TommCatt May 25 '15 at 23:38
  • I think you've missed the 'symptoms that apply' in your answer? That is the reason I referred you to my comment... I've explained my new database and my problem in my last comment on user PerformanceDBA. – Engo May 26 '15 at 12:33
  • Yes, you are absolutely correct, I have missed the 'symptoms that apply' in my answer. In fact, I missed 'symptoms that apply' or even the word 'apply' anywhere in your question. I think at this point you should spend some time in the tutorial of this site and read up on how to ask a question. Top of this page, Help -> Tour. – TommCatt May 26 '15 at 16:35