1

EDIT : Thanks for your very fast answers !
I do understand how the numbers are represented, and why such a result can be observed. My question really is about a way to make them add up to 1.0.


I have an alphabet of 4 letters A, C, G and T.
I counted them up, thus have the total letter count and each individual count.

unsigned int A_count; //initialized
unsigned int C_count; //initialized
unsigned int G_count; //initialized
unsigned int T_count; //initialized
//a bit hacky, enables floating point division
double total_count = A_count + C_count + G_count + T_count;

Then, I try to compute their frequency :

double A_frequency = A_count / total_count;
double C_frequency = C_count / total_count;
double G_frequency = G_count / total_count;
double T_frequency = T_count / total_count;

But this doesn't always work for me because the sum of the frequencies can be greater than 1, and I need it to be equal to 1.0 exactly.


Exemple :

std::cout << "Result : " << A_frequency + C_frequency + G_frequency + T_frequency << std::endl;
Result : 1.000[...]01

I need it to generate a meme file as documented here (MEME file format).
One can read in the relevant part that :

As each row contains the probability of each letter in the alphabet the probabilities in the row must sum to 1.

As @TonyK pointed out in the comments, the MEME file itself doesn't need the sum to be exactly 1.0, despite what the documentation says.

But in my case, the meme file is just created to be the input of another program which needs the sum of the frequencies to be exactly 1.0.


Is there any good/pretty way to do it ? If there isn't, why ?

(This is my very first post on stackoverflow, if something is wrong with it, please tell me and I'll correct it, thank you)

  • @tobi303, OP reopened as now it is clear your asking for a way instead to work around floating point constraints and not why they exist in the first place. – NathanOliver May 15 '17 at 13:30
  • 1
    As you can see from their example row `0.055556 0.000000 0.888889 0.055556`, the probabilities don't really have to add up to 1. If the software is not complaining, I think you can leave it as it is. – TonyK May 15 '17 at 14:02
  • @TonyK That is true indeed, I'll check right now – debilausaure May 15 '17 at 14:35
  • @TonyK I double checked, I need them to add up to 1.0 exactly – debilausaure May 15 '17 at 15:00

1 Answers1

1

You could use a fraction type that stores the numerator (= counts for individual letter) and the numerator (= total counts). In this way you can be sure that after adding up the frequencies you will get 1 ( = total counts / total counts).

Actually I would consider if it is really worth the effort to calculate the frequencies in the first place. You could also only store the counts and only when needed divide them by the total number of counts.

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
  • Thanks for your answer, I think it is the cleanest way of doing it if you don't need to print it but I cannot output it like a regular floating point to the file without having the exact same issue (see my edit to the post ) – debilausaure May 15 '17 at 13:54
  • @pledidSkltn because you are adding freuqencies and that cannot work precisely. If you add the counts, divide them by total number of counts, I would be very suprised if you didnt get 1 exactly – 463035818_is_not_an_ai May 15 '17 at 13:56
  • @plendidSkltn sorry, now I got it ... you need the floating numbers such that they add up correctly... – 463035818_is_not_an_ai May 15 '17 at 13:57
  • @plendidSkltn how is the check that they sum up to 1 performed? Consider counts 3,3,3,0 then frequencies would be 0.333,0.333,0.333,0.0 and it would be not trivial to make them add up to 1 exactly – 463035818_is_not_an_ai May 15 '17 at 13:59
  • @plendidSkltn well I read the docs you are refering to and imho that statement about adding up to 1 is a bit fishy. I would ask the providers of the lib what they exactly mean with that requirement, because with floating points it isnt obvious what is meant by two numbers being equal. They probably use some espilon fininte accuary, but you need to know that – 463035818_is_not_an_ai May 15 '17 at 14:02
  • I guess it needs to be 0.33330.3333,0.3334,0.0. This is exactly the process I want to implement. Even so 0.3333,0.3333,0.3333,0 isn't the toughest one, I could find cases where probabilities could be 0.51,0.50,0,0 because of how floating points work. – debilausaure May 15 '17 at 14:06
  • yes that would be a nice thing to know, thank you for your answers – debilausaure May 15 '17 at 14:07
  • @plendidSkltn I dont believe that any lib would ask you to do something like this. Did you read TonyKs comment? I believe that you trying to solve a non-problem. Did you try to pass you freqs without that correction? – 463035818_is_not_an_ai May 15 '17 at 14:08
  • I double checked, I need it to be equal to 1.0 – debilausaure May 15 '17 at 14:59
  • @plendidSkltn but for floating point numbers it doesnt make much sense to require them to be equal 1. They can only be meaningful compared by asking for `abs(a-b) < epsilon` and before trying to solve the problem I would try to find out what `epsilon` is used – 463035818_is_not_an_ai May 15 '17 at 15:23