0

I'm working on a class to represent numbers in scientific notation with the plan of overcoming floating-point precisions within calculations of numbers...

Typically if we have two floating-point numbers as illustrated:

double a = 23456234.892;
double b = 0.00000314;
double c = a + b;

We can run into precision problems as well as rounding errors.

To try and combat this I decided to design a class that works on the principles of doing calculations in scientific notation.

Here is the main structure of my class:

template<uint16_t BASE = 10>
class Fpn { // Floating Point Notation
public:
    constexpr uint16_t Base{BASE};
private:
    double coefficient_;
    int64_t exponent_;

public:
    Fpn() : coefficient_{0}, exponent_{1} {}
    Fpn( double coeff, uint64_t exp ) : coefficient_{coeff}, exponent_{exp} {}
    Fpn(const Fpn& rhs) { 
        coefficient_ = rhs.coefficient_;
        exponent_ = rhs.exponent_;
    }
    Fpn& operator=(const Fpn& rhs) {
        coefficient_ = rhs.coefficieint_;
        exponent_ = rhs.coefficieint_;
        return *this;
    }

    // Unary Operators:
    Fpn operator+() const { return *this; }
    // Negation:
    Fpn operator-(bool invert = false) {
        if (!invert) return { Fpn(-coefficient_, exponent_); }
        return Fpn(coefficient_, -exponent_);
    }

    // arithmetic operators:
};

Now I'm at the point where I'm working on my arithmetic operators: operator+(const &obj), operator+=(const &obj), operator-(const &obj), operator*, operator*=, operator/, operator/=...

Now before we can Add or Subtract, they have to be of the same order of exponentiation...

I was thinking of creating a private helper function for doing this conversion and then calling that in my +, +=, -, & -= operators...

Within my function, I'm thinking on the lines of figuring out which one has the greater exponent, and then I want to convert the lower order to the higher-ordered object...

private:
    void convert(Fpn& rhs) {
        if(exponent_ == rhs.exponent_) return; // Don't need to do anything

        // This will tell me which one has the higher exponent
        auto higher_order = (exponent_ > rhs.exponent_) ? exponent_ : rhs.exponent_;

        // This is where I'm getting a bit stumped... 
        auto difference = higher_order - /*? smaller of the two* ... another ternary check?*/
        // I'm checking negative against positive so that if abs() doesn't have to be called... 
        // or would that not matter and just use `abs()` regardless? 

        // Also, once I figure out which one needs to be converted
        // Then I have to convert the correct side RHS vs LHS...
        // ????

        if (difference < 0) { // negative case
            coefficient_ *= (abs(difference)*Base);
            exponent_ /= (abs(difference)*Base);
        }
        if (difference > 0) { // positive case
            coefficient_ *= difference * Base;
            exponent_ /= difference * Base;
        }
    }

If the LHS's exponent is greater than the RHS's then I want to convert the RHS to LHS and the do the same if RHS is greater than LHS...


Please do not suggest std::scientific I know that exists and that's not what I'm after.


Would I have to then use another ternary operator to return back the smaller of the two? Or, without having to perform a bunch of conditional checks... are there any viable algorithms with the stl or some type of lambda expression that would help with this part of the process?

Also, I'm calculating the difference between them and then checking if that difference is negative or positive... if it's negative then I'm using abs() from <cmath> otherwise if it's positive I'm not calling abs()... Now, this is just a side question as it is related to this function and its design... Would it be better to not have a conditional check and just call abs() regardless or would it be better to branch between the two?

There is another function that I need write which will be another helper function that some of these operators will call after they perform their arithmetic calculations to convert the final result as a value where it's coefficient is between [1,10] if the Base is 10, but that function is not written yet as I need to finish with this convert() function.

I know what I would like to do, but I'm struggling a bit with design decisions while trying to use the appropriate techniques and algorithms.

I appreciate any and all feedback, tips, suggestions, etc.

Daniel A. White
  • 187,200
  • 47
  • 362
  • 445
Francis Cugler
  • 7,788
  • 2
  • 28
  • 59
  • 1
    Isn't this what floating point already _is_? A significand, and an exponent. The only difference here is the base. So, are you perhaps looking for the [`std::decimal`](https://stackoverflow.com/q/12865585/4386278) extension? Or https://launchpad.net/std-decimal. Or decNumber++. Or... – Asteroids With Wings Aug 19 '20 at 11:46
  • @AsteroidsWithWings Not exactly... I'm trying to extend the idea of floating-point. However, the intention of this class is that with a coefficient it has the size of a double, the exponent has a size of `int64_t` I also had a typo for the `exponent` type, it was supposed to be `int64_t` and not `uint64_t`... And the base is templated... – Francis Cugler Aug 19 '20 at 11:48
  • @AsteroidsWithWings I can then take a value of say 23452228342.20 and add 0.00000000003002343 to it and still maintain the value as long as each part fits within their precisions. – Francis Cugler Aug 19 '20 at 11:50
  • 1
    Right, which is what decimal floating-point types do. – Asteroids With Wings Aug 19 '20 at 11:51
  • @AsteroidsWithWings On a 64-bit machine, a double is typically 8 bytes or 64 bits, and only so many bits are reserved for the mantissa, a bit for the sign, and so many bits for the exponent depending on the convention they use such as IEE-754 standard... Here' I'm extending the fact that the base has 64 bits and that the exponent has 64 bits. – Francis Cugler Aug 19 '20 at 11:54
  • Yes, you said _"as long as each part fits within their precisions"_. For the example you gave, the exponents are _well_ within the precision of the exponent in a `std::decimal::decimal128`. Like, by two orders of magnitude. Why do you think you need exponents up to ±9,223,372,036,854,775,807? That seems unlikely. – Asteroids With Wings Aug 19 '20 at 11:56
  • Why write such library yourself and not use one of existing ones? I would suggest making `double coefficient_;` an integer. Your `void convert(Fpn& rhs)` is doing a normalization of floating point number, right? – KamilCuk Aug 19 '20 at 11:56
  • @AsteroidsWithWings I'm writing this as a class for my own personal library! The question is about writing the algorithm within the convert() function. – Francis Cugler Aug 19 '20 at 11:57
  • @FrancisCugler Okay. – Asteroids With Wings Aug 19 '20 at 11:57
  • 2
    @KamilCuk Person A has a recipe for making Chocolate Chip Cookies, Person C has their recipe for making Chocolate Chip Cookies... Just because I can! – Francis Cugler Aug 19 '20 at 11:57
  • @KamilCuk Well I'll be using this with my other classes within my libraries, and it's the types and how I'll be using it that is of interest to me. It's also a good practice. I like to first write and design an algorithm to have a better understanding of how they work before I just go and start using one from a library assuming it will do what I want. – Francis Cugler Aug 19 '20 at 12:01
  • @AsteroidsWithWings you stated why would you need exponents of up to +/-9...... That seems unlikely. I'm planning on using it in a simulator for doing calculations both at the plank scale as well as at the cosmic scale and I'll be having several zoom layers! – Francis Cugler Aug 19 '20 at 12:06
  • 1
    concerning check for sign and then use `abs` or don't use it, consider that either `abs` already checks for signedness (and returns the original value when it was positive), and if it doesnt (because it merely sets a sign-bit) then even better, in any case you dont need to check for the signdness before calling `abs` – 463035818_is_not_an_ai Aug 19 '20 at 12:07
  • @idclev463035818 Thanks for the clarity there... sometimes just need a reminder! – Francis Cugler Aug 19 '20 at 12:07
  • 1
    however, the note here might be relevant for you: https://en.cppreference.com/w/c/numeric/math/abs – 463035818_is_not_an_ai Aug 19 '20 at 12:08
  • You need exponents of the order of 30, 40, 50 for Planck scale. Not 9 billion billion billion. – Asteroids With Wings Aug 19 '20 at 12:08
  • @AsteroidsWithWings but I'm going beyond plank, I'm pushing the limit of limits! Why just because I can! It's not written in stone that the `Plank` length is the smallest length! You can always divide by 2 and never reach 0 and you can do this infinitely and still never reach 0... So who knows, maybe having that much precision might invent a whole new field of mathematics and physics! – Francis Cugler Aug 19 '20 at 12:12
  • _"It's not written in stone that the Plank length is the smallest length!"_ Well, it is... – Asteroids With Wings Aug 19 '20 at 12:16
  • @AsteroidsWithWings I'll challenge anything and everything! There are no limits! However, my library isn't just based on `floating-point` precisions, it is also based on various number systems... So if the user templates this with say <16> then it will do log 16 math... if its set to <2> it will do binary base math... Well, the math is currently done in decimal for now. Once I get the core functionality working, then I'll worry about doing math in other bases... – Francis Cugler Aug 19 '20 at 12:36
  • @AsteroidsWithWings This class won't just represent 1.3x10^3 + -9.24^-6, it will also be able to represent 4Ax16^-3A + -FCx16^2B, etc... – Francis Cugler Aug 19 '20 at 12:37

0 Answers0