Upsampling: insert extra values between each consecutive elements of a vector

Question

Suppose we a have a vector V consisting of 20 floating point numbers. Is it possible to insert values between each pair of these floating points such that vector V becomes a vector of exactly 50 numbers.

The inserted value should be a random number between upper and lower value I've decided to insert a midpoint of two values between the two.

I have try the following:

 vector<double> upsample(vector<double>& in)
 {

    vector<double> temp;
    for (int i = 1; i <= in.size() - 1 ; i++)
    {
        double sample = (in[i] + in[i - 1]) / 2;
        temp.push_back(in[i - 1]);
        temp.push_back(sample);
    }
    temp.push_back(in.back());
    return temp;
}

with this function the input vector elements increases by 2(n) - 1 (20 elements becomes 39). It can be possible that the input vector has different sizes less than 50.

I think it can be done by inserting more than one value between two elements randomly to have a vector of size 50 (e.g. between V[0] and V[1] insert 3 values, between V[3] and V[4] insert 1 value, etc.). Is this possible?

Could you please guide me how to perform this? Thank you.

It sounds like you are searching for code that determines how many values need to be added before each new value from in. You could alternate between the floor of the ratio (desired length / input length) and its ceil. How often you use floor or ceil also depends on the ratio. This would result in a quite regular interpolation. Or do you want a random decision so it becomes irregular? — ypnos, Jul 31 '19 at 09:22
Should the array be upsampled as uniformly as possible? I.e., is it a problem if between element n and n+1 5 items are added, but between n+1 and n+2 none? — SilverTear, Jul 31 '19 at 09:30
@ypnos For example If V= {0.1,0.2,0.3} the desired output might be like V' ={0.1,0.12,0.14,0.17,0.2,0.27,0.29,0.3} — student_11, Jul 31 '19 at 09:51
@SilverTear Yes. it would be better to add at least one value between each two. Also for accuracy purposes. — student_11, Jul 31 '19 at 09:54
@ypnos Could you please explain this statement: "How often you use floor or ceil also depends on the ratio." — student_11, Jul 31 '19 at 10:08
You can check also Bresenham's line algorithm (aliasing geometric line onto pixel raster is in principle the same aliasing issue... you have 20 input values = let's say that "y-axis" and you want 50 output values = "x-axis", so you are drawing line (+50,+20) with every starting point at new line being "copy" of input, and remaining one or two points are new value generated. If you want to spread the new values across whole range as uniformly as possible. The important thing to understand is, that due to "aliasing" it will be perfect only for very specific values, so decide how to alias first. — Ped7g, Jul 31 '19 at 10:09
Thank you. I'll check it out. it sounds like filling the blanks with mapping some values. right? — student_11, Jul 31 '19 at 10:15
yes. You can use the math behind Bresenham to decide when to fill (with new generated value) and when to copy (the original input value). How you generate the filling is not related to that. If you will use simple linear interpolation (i.e. truly drawing lines :)), the largely upscaled data will show this behaviour (which may be good if the data are important and somebody may need to know which ones were original), adding some random noise (perlin noise 2D height maps for example) can help if the data are used for some visual effect, like bump mapping, etc.. — Ped7g, Jul 31 '19 at 10:22
and the latest fashion is to use NN trained on other data containing "details" to fill up the gaps on the low resolution data, like upscaling images, etc... which slightly triggers me, when I see it, because the provided data often kinda make sense, although you should never forget they are basically random and not real. As long as people use it to upscale textures for game, it's excellent stuff, but when somebody will upscale some important data like this, and hide the information, it may bite somebody later who will be not aware of the artificial origin of the data... :) — Ped7g, Jul 31 '19 at 10:25
@SilverTear. If the input vector' s length is close to 50. say 49. So we should insert just one more sample randomly between any pair of elements. So in this case the number of inserted values between other elements in none. — student_11, Jul 31 '19 at 10:42
In digital signal processing a standard method is, if you want to scale a signal from X Hz to Y Hz, then the way to do it is to upsample to a common multiple, and then decimate down again. In this case it would be to upsample to 100 Hz, and throw every second sample away. The benefit here is that you only work in whole integers, and it is the same process no matter if you want to up- or down-sample. The problem is that it gets pretty inefficient if the to/from have few common prime factors (eg. 49 Hz to 50 Hz requires upsampling to 2450 Hz as the intermediate). — Frodyne, Jul 31 '19 at 10:53

score 3 · Accepted Answer · answered Aug 01 '19 at 01:25

So I did some math myself, because I was curious how to get the weight ratios (as if upsampling linearly up to common multiple and then extracting only target values from large array - but without creating the large array, just using weights to know how much left+right elements contribute to particular value).

The sample code does always create new value by simple weighted average (i.e. 40% of 123.4 with 60% 567.8 will give "upscaled" value 390.04), no random used to pepper the upscaled values (leaving that part to OP).

The ratios goes like this:

if vector of size M is being upscaled to size N (M <= N) (the "upscale" will always preserve first and last element of input vector, those are "fixed" in this proposal)

then every upscaled element can be viewed as being somewhere between some original elements [i, i+1].

If we would declare the "distance" between the source elements [i, i+1] equal to d = N-1, then the upscaled element distance between can be always expressed as some j/d where j:[0,d] (when j is actual d, it's precisely at "i+1" element and can be considered the same case as j=0 but with [i+1,i+2] source elements)

And the distance between two upscaled elements is then M-1.

So when source vector is size 4, and upscaled vector size should be 5, the ratios for upscaled elements are [ [4/4,0/4], [1/4,3/4], [2/4,2/4], [3/4,1/4], [0/4,4/4] ] of elements (indices into vector) [ [0,1], [0,1], [1,2], [2, 3], [2, 3] ]. ("distance" between source elements is 5-1=4, that's then the "/4" to normalize weights, "distance" between upscaled elements is 4-1=3, that's why the ratios are shifting by [-3,+3] in every step).

I'm afraid that my description is far from "obvious" (how it feels in my head after figuring it out), but if you will put some of that into spreadsheet and toy around with it, hopefully it will make some sense. Or maybe you can debug the code to get better feel how that mumbling above transforms into real code.

Code example 1, this one will "copy" source element only if the weight is precisely fully onto it (i.e. in the example data only first and last element are "copied", rest of the upscaled elements are weighted averages of original values).

#include <iostream>
#include <vector>
#include <cassert>

static double get_upscale_value(const size_t total_weight, const size_t right_weight, const double left, const double right) {
    // do the simple weighted average for demonstration purposes
    const size_t left_weight = total_weight - right_weight;
    return (left * left_weight + right * right_weight) / total_weight;
}

std::vector<double> upsample_weighted(std::vector<double>& in, size_t n)
{
    assert( 2 <= in.size() && in.size() <= n ); // this is really only upscaling (can't downscale)

    // resulting vector variable
    std::vector<double> upscaled;
    upscaled.reserve(n);

    // upscaling factors variables and constants
    size_t index_left = 0;                      // first "left" item is the in[0] element
    size_t weight_right = 0;                    // and "right" has zero weight (i.e. in[0] is copied)
    const size_t in_weight = n - 1;             // total weight of single "in" element
    const size_t weight_add = in.size() - 1;    // shift of weight between "upscaled" elements

    while (upscaled.size() < n) {               // add N upscaled items
        if (0 == weight_right) {
            // full weight of left -> just copy it (never tainted by "upscaling")
            upscaled.push_back(in[index_left]);
        } else {
            // the weight is somewhere between "left" and "right" items of "in" vector
            // i.e. weight = 1..(in_weight-1) ("in_weight" is full "right" value, never happens)
            double upscaled_val = get_upscale_value(in_weight, weight_right, in[index_left], in[index_left+1]);
            upscaled.push_back(upscaled_val);
        }
        weight_right += weight_add;
        if (in_weight <= weight_right) {
            // the weight shifted so much that "right" is new "left"
            ++index_left;
            weight_right -= in_weight;
        }
    }

    return upscaled;
}

int main(int argc, const char *argv[])
{
    std::vector<double> in { 10, 20, 30 };
//    std::vector<double> in { 20, 10, 40 };

    std::vector<double> upscaled = upsample_weighted(in, 14);
    std::cout << "upsample_weighted from " << in.size() << " to " << upscaled.size() << ": ";
    for (const auto i : upscaled) {
        std::cout << i << " ";
    }
    std::cout << std::endl;
    return 0;
}

output:

upsample_weighted from 3 to 14: 10 11.5385 13.0769 14.6154 16.1538 17.6923 19.2308 20.7692 22.3077 23.8462 25.3846 26.9231 28.4615 30

Code example 2, this one will "copy" every source elements and use weighted average only to fill up gaps between, so as much of original data as possible is preserved (for the price of the result being not linearly-upscaled of original data set, but "aliased" to "grid" defined by target size):

(the code is pretty much identical to first one, except one if line in the upscaler)

#include <iostream>
#include <vector>
#include <cassert>

static double get_upscale_value(const size_t total_weight, const size_t right_weight, const double left, const double right) {
    // do the simple weighted average for demonstration purposes
    const size_t left_weight = total_weight - right_weight;
    return (left * left_weight + right * right_weight) / total_weight;
}

// identical to "upsample_weighted", except all source values from "in" are copied into result
// and only extra added values (to make the target size) are generated by "get_upscale_value"
std::vector<double> upsample_copy_preferred(std::vector<double>& in, size_t n)
{
    assert( 2 <= in.size() && in.size() <= n ); // this is really only upscaling (can't downscale)

    // resulting vector variable
    std::vector<double> upscaled;
    upscaled.reserve(n);

    // upscaling factors variables and constants
    size_t index_left = 0;                      // first "left" item is the in[0] element
    size_t weight_right = 0;                    // and "right" has zero weight (i.e. in[0] is copied)
    const size_t in_weight = n - 1;             // total weight of single "in" element
    const size_t weight_add = in.size() - 1;    // shift of weight between "upscaled" elements

    while (upscaled.size() < n) {               // add N upscaled items
/* ! */ if (weight_right < weight_add) { /* ! this line is modified */
            // most of the weight on left -> copy it (don't taint it by upscaling)
            upscaled.push_back(in[index_left]);
        } else {
            // the weight is somewhere between "left" and "right" items of "in" vector
            // i.e. weight = 1..(in_weight-1) ("in_weight" is full "right" value, never happens)
            double upscaled_val = get_upscale_value(in_weight, weight_right, in[index_left], in[index_left+1]);
            upscaled.push_back(upscaled_val);
        }
        weight_right += weight_add;
        if (in_weight <= weight_right) {
            // the weight shifted so much that "right" is new "left"
            ++index_left;
            weight_right -= in_weight;
        }
    }

    return upscaled;
}

int main(int argc, const char *argv[])
{
    std::vector<double> in { 10, 20, 30 };
//    std::vector<double> in { 20, 10, 40 };

    std::vector<double> upscaled = upsample_copy_preferred(in, 14);
    std::cout << "upsample_copy_preferred from " << in.size() << " to " << upscaled.size() << ": ";
    for (const auto i : upscaled) {
        std::cout << i << " ";
    }
    std::cout << std::endl;
    return 0;
}

output:

upsample_copy_preferred from 3 to 14: 10 11.5385 13.0769 14.6154 16.1538 17.6923 19.2308 20 22.3077 23.8462 25.3846 26.9231 28.4615 30

(notice how "20.7692" from example 1 is here just "20" - copy of original sample, even if at that point the "30" has some small weight if linear interpolation is considered)

I really appreciate your help. It seems that weighted up-sampling gives more accurate output for my program. But i found it a bit hard to understand the details of your code. — student_11, Aug 02 '19 at 09:41
@student_11 just ask in the comments here about particular parts... I tried to keep it simple, if this would be "production" code for my app, I would change few parts for better performance, but it would be slightly harder to read, so it should actually read well? = which makes your feedback about which parts are difficult super valuable to me, this is always very interesting to me, to hear about how other people think about code, and what is difficult for them. :) — Ped7g, Aug 02 '19 at 10:29
But the main principle is same as mentioned in original comment, what Frodyne wrote is just less efficient variant of the code I provided, and after thinking about his comment I finally (after all those decades) realized how Bresenham got that math for his line algorithm, it's actually the same principle. When you have two values M,N, both principles start with M\*N (or optimized to common multiple) as "total amount of steps", because such value can be divided without remained for both M and N steps. So that's like the general "tick" value [0..M\*N] to which everything else is relative to. — Ped7g, Aug 02 '19 at 10:34
Thanks for your valuable comments. I just used your code(with some modifications) in my project and it worked really well. So i have not debug it yet. After debugging and diving into Bresenham algorithm. I'll asked my real questions about the code :). Thank you again. — student_11, Aug 02 '19 at 11:39

Upsampling: insert extra values between each consecutive elements of a vector

1 Answers1