2

In a nutshell, I programmed myself into a corner by creating a CLR aggregate that performs row id concatenation, so I say:

select SumKeys(id), name from SomeTable where name='multiple rows named this'

and I get something like:

SumKeys         name
--------        ---------
1,4,495         multiple rows named this

But it dies when SumKeys gets > 8000 chars and I don't think I can do anything about it.

As a quick fix (it's only failing 1% of the time for my application) I thought I might compress the string down and I thought some of you bright people out there might know a slick way to do this.

Something like base64 made for 0-9 and a comma?

Aaron Anodide
  • 16,906
  • 15
  • 62
  • 121
  • related: [Compressing big number (or string) to small value](http://stackoverflow.com/questions/1355100/compressing-big-number-or-string-to-small-value) – David Cary Aug 21 '11 at 14:45

6 Answers6

2

You'd be much better of if you figure out more reasonable storage for your data (maybe HashSet)...

But for compression try regular System.IO.Compression.GZipStream ( http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx ) and convert resulting byte array to base64 string if needed... or store as byte array.

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
2

How about a hexadecimal representation, where every digit represents a 4-bit half of a character byte (a nibble), with 0xa used as the comma? You will only get a 50% compression, but it is fast and simple.

Tony the Pony
  • 40,327
  • 71
  • 187
  • 281
1

Not sure how "fancy" you'd consider it, but zip/gzip compression is highly effective for any text (sometimes to the tune of 90% reduction or better). Since you're already working with C# and CLR integration, it hopefully wouldn't be too hard to setup/deploy. I haven't tinkered with any C# libraries for compression yet, but it's easy to find them. For example: http://sharpdevelop.net/OpenSource/SharpZipLib/ or http://dotnetzip.codeplex.com/ or even http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx

Or an easier option might be to switch your field to text or varchar/nvarchar(max), if that's feasible.

Paul Karlin
  • 840
  • 7
  • 21
1

You can use a Huffman tree. This is basically an algorithm to compress ascii into binary. I was told that it is basically what WinZIP uses, but I'm not sure if that is really true or not. I did a quick search for huffman coding c# and there seems to be at least one decent implementation out there, though I haven't used any of them.

If your "vocabulary" is just digits and commas, a Hoffman tree will get you very good compression.

http://www.enusbaum.com/blog/2009/05/22/example-huffman-compression-routine-in-c/

Jeff
  • 13,943
  • 11
  • 55
  • 103
0

try:

SELECT name, GROUP_CONCAT(id) FROM SomeTable GROUP BY name WHERE name = 'multiple rows named this'
ic3b3rg
  • 14,629
  • 4
  • 30
  • 53
0

I came across a method that will work with SQL Server:

SELECT
  STUFF((
    SELECT ','+id FROM SomeTable a WHERE a.name = b.name FOR XML PATH('')
  ),1,1,'') AS SumKeys, name
FROM SomeTable b
GROUP BY name
WHERE name = 'multiple rows named this'

The WHERE clause is optional

ic3b3rg
  • 14,629
  • 4
  • 30
  • 53