5

I have a large array with 1024 entries that have 7 bit values in range(14, 86)

This means there are multiple range of indices that have the same value.

For example,

consider the index range 741 to 795. It maps to 14
consider the index range 721 to 740. It maps to 15
consider the index range 796 to 815. It maps to 15

I want to feed this map to a python program that would spew out the following:

if((index >= 741) and (index <= 795)) return 14;
if((index >= 721) and (index <= 740)) return 15;
if((index >= 796) and (index <= 815)) return 15;

Some code to groupby mapped value is ready but I am having difficulty coding up an expression using pairwise.

Anyone has done something similar before?

I have uploaded the dataset in two forms:

Usual, ordered by index.

Grouped by mapped value.

Nathaniel Ford
  • 20,545
  • 20
  • 91
  • 102
PoorLuzer
  • 24,466
  • 7
  • 31
  • 35
  • This is going to work on an 8 bit 8051 core. – PoorLuzer Aug 06 '11 at 04:21
  • A very bad attempt at table compression I must say. If anyone can suggest a better way (a mix of RLE and delta coding?) of compressing the dataset with comparable performance of a array lookup, it would be highly appreciated! – PoorLuzer Aug 06 '11 at 05:59
  • You only need to store a quarter of the samples (see my comment below); and don't store the offset of 50 inside the table, then you only get values in the range 0..+36. This reduces you to 256 samples * 6 bits, i.e. 87.5% compression. – smci Aug 06 '11 at 06:19
  • I am aware of the approach of exploiting wave symmetry but the table might also be made to hold some asymmetric samples, including simulated noise. However, deviation between pairs would not move more than 3 - 4 bits LSB. Otherwise, polynomial interpolation with a bit of LUT would have worked if I was interested in only a sine. – PoorLuzer Aug 06 '11 at 06:24
  • 2
    RLE coding can give you further compression on g(n)=36 for 228..255, 35 for 208..227, 34 for 195..207, 33 for 183..194, 32 for 173..182, 31 for 164..172, 30 for 156..163. Depends on your tradeoff between table size and code size. In the extreme case, you could implement full binary-search, using the compressed table. – smci Aug 06 '11 at 06:26
  • I don't think I understand the goal here. Surely you're not going to run Python code on an 8051 core? And if you're going to do calculations to decompress the table, why not just do calculations to get a value directly? You're aware of algorithms such as CORDIC, yes? – Karl Knechtel Aug 06 '11 at 06:40
  • I like your idea about RLE. Looking for something that's loss-less and combined RLE with delta coding – PoorLuzer Aug 06 '11 at 06:43
  • Actually 79%, not 87.5% compression, doh. Eliminating the 7th bit due to the +50 offset actually saves 16%. Combine that with 75% for the fourfold index reduction to 0..255. You can further eliminate the 6th bit if you RLE for 6-bit values >= 32 – smci Aug 06 '11 at 06:44
  • 1
    CORDIC will work on periodic fns, he said he also added simulated noise. It might prove more compact to separately tabulate the fn and the noise. He would need to `show us teh dataz` before we could tell ;-) – smci Aug 06 '11 at 06:45
  • @PoorLuzer, agf, let's take it to [chatroom](http://chat.stackoverflow.com/rooms/info/2181/compressing-a-sinewave-table?tab=general)? You haven't defined the criteria (table size vs code size vs number of code cycles executed) for what constitutes 'best'. – smci Aug 06 '11 at 06:52
  • CORDIC is also iterative. I will think of a way to get some data over the weekend, otherwise I will choose an answer as both are very good solutions and smci went beyond the OP. – PoorLuzer Aug 06 '11 at 06:53
  • 1
    I added the arithmetic for storing just a quarter of the table. – agf Aug 06 '11 at 07:14
  • @agf: now you have confused me with which answer to choose :-D. The original dataset was prepared minimizing the THD. Those off by one errors are not really rounding but compensation to reduce THD. I will compare the THD of the signal formed from the quarter set and report back. – PoorLuzer Aug 06 '11 at 15:21
  • @PoorLuzer, your question as asked differs from the intent... this keeps kind of morphing and is a moving target... I think it's time to close and reask as a new question (and reference this by link). – smci Aug 07 '11 at 01:13
  • I agree. I am choosing yours as the answer as you initially went beyond the OP, although as it stands now, agf and you have the same high quality answer! – PoorLuzer Aug 07 '11 at 18:57

2 Answers2

3

If you don't mind slightly different values due to rounding, I can compress that really well for you.

from math import pi, sin
interval=2*pi/1024
sinval=lambda i:int(round(sin(i*interval)*36))+50

Here is code to actually do what you want; it works with

vals = sorted((sinval(i), i) for i in range(1024))

as test data. You would need to switch the order of val and index in the for loop here if you've got indexes in the first column.

ranges, oldval, oldidx = [[0, 0]], 0, 0
for val, index in vals:
    if not (val == oldval and index == oldidx + 1):
        ranges[-1].append(oldidx)
        ranges.append([val, index])
    oldval, oldidx = val, index
ranges[-1].append(oldidx)
ranges.pop(0)
ifs = ('if((index >= {1}) and (index <= {2})) return {0};\n'.format(val, start, end)
            for val, start, end in ranges)
print ''.join(ifs)

Edit: Whoops, I was missing a line. Fixed. Also, you're multiplier was actually 36 not 35, I must have rounded (14, 86) to (15, 85) in my head.

Edit 2: To show you how to store only a quarter of the table.

from math import pi, sin

full = 1024
half = 512
quarter = 256
mag = 72
offset = 50

interval = 2 * pi / full

def sinval(i):
    return int(round(sin(i * interval) * (mag // 2))) + offset

vals = [sinval(i) for i in range(quarter)]

def sintable(i):
    if  i >= half + quarter:
        return 2 * offset - vals[full - i - 1]
    elif  i >= half:
        return 2 * offset - vals[i - half]
    elif i >= quarter:
        return vals[half - i - 1]
    else:
        return vals[i]

for i in range(full):
    assert -1 <= sinval(i) - sintable(i) <= 1

If you subtract the offset out of the table just make the first two -vals[...] instead.

Also, the compare at the bottom is fuzzy because I get 72 off-by-one errors for this. This is simply because your values are rounded to integers; they're all places that you're halfway in between two values so there is very little decrease in accuracy.

agf
  • 171,228
  • 44
  • 289
  • 238
  • This is going to work on an 8 bit 8051 core. However, I would like to know how you deduced the constants! – PoorLuzer Aug 06 '11 at 04:20
  • 1
    `(85 - 15) / 2 = 35`, and the table started and ended at 50... I'll take a look and actually answer your question in a sec. – agf Aug 06 '11 at 04:25
  • I appreciate that! Once you get the answer, compare the output of your formula above and the table I uploaded: there would be some changes in value. Hence, I would suggest you to test with the table I uploaded. – PoorLuzer Aug 06 '11 at 04:36
  • Great answer. On a different note, this does not look like to great way of "compressing" a wavetable now, does it :-( – PoorLuzer Aug 06 '11 at 05:54
  • 1
    Here's how to only store a quarter of the table. – agf Aug 06 '11 at 07:10
2

After closing, I belatedly found this solution "What's the most Pythonic way to identify consecutive duplicates in a list?".


NB: with a periodic fn like sine, you can get by by only storing a quarter (i.e. 256 values) or half of the table, then perform a little (fixed-point) arithmetic on the index at lookup time. As I commented, if you further don't store the offset of +50, you need one bit less, at the cost of one integer addition after lookup time. Hence, 79% compression easily achievable. RLE will give you more. Even if the fn has noise, you can still get decent compression with this general approach.

As agf pointed out, your f(n) = 50 + 36*sin(72*pi*n/1024) = 50 + g(n), say.

So tabulate the 256 values of g(n) = 36*sin(72*pi*n/1024), only for the range n=0..255

Then f(n) is easily computed by:

if 0 <= n < 256, f(n) = 50 + g(n)
if 256 <= n < 512, f(n) = 50 + g(511-n)
if 512 <= n < 768, f(n) = 50 - g(n-512)
if 768 <= n < 1024, f(n) = 50 - g(1023-n)

Anyway here's a general table compressor solution which will generate (istart,iend,value) triples.

I knocked my head off how to do this more Pythonically using list comprehensions and itertools.takewhile() ; needs polishing.

#import itertools

table_="""
    0       50
    1       50
    ...
    1021    49
    1022    50
    1023    50""".split()

# Convert values to int. Throw away the indices - will recover them with enumerate()
table = [int(x) for x in table_[1::2]]

compressed_table = []
istart = 0
for i,v in enumerate(table):
    if v != table[i-1]:
        iend = i-1
        compressed_table.append((istart,iend,table[i-1]))
        istart = i
    else:
        continue # skip identical values
# Slightly ugly: append the last value, when the iterator was exhausted
compressed_table.append((istart,i,table[i]))

(NB I started the table-compressor approach before agf changed his approach... was trying to get an itertools or list-comprehension solution)

Community
  • 1
  • 1
smci
  • 32,567
  • 20
  • 113
  • 146
  • Nice job. Your solution is functionally the same. I am aware of the approach of exploiting wave symmetry but the table might also be made to hold some asymmetric samples, including simulated noise. However, deviation between pairs would not move more than 3 - 4 bits LSB. Otherwise, polynomial interpolation with a bit of LUT would have worked if I was interested in only a sine. – PoorLuzer Aug 06 '11 at 06:17
  • 1
    @PoorLuzer, then maybe tabulate the high bits and low bits separately. Show us some actual sample noise data? What tradeoff of data compactness vs code do you want? – smci Aug 06 '11 at 06:32
  • 1
    Giving examples make people think of the specifics, but consider the DTMF wavetable.Don't think about the specifics of a tone being the sum of two sines, but think at a level of abstraction that one digit = 1 unique waveform. For noise, think about the wav encoding of the sound "hi". – PoorLuzer Aug 06 '11 at 06:42
  • 1
    You're off by one for 256-511 and 768-1023. – agf Aug 06 '11 at 07:38