2

I have only started learning feature hashing so I need help in understanding if I can apply the hash function expressed mathematically as https://en.wikipedia.org/wiki/Tent_map.

and one such application of Tent map is in cryptography -- Paper 1: Implementation of Hash Function Based On Neural Cryptography download link.

In feature hashing, Let x be a data point of D dimension i.e., it has D number of elements. In feature hashing, a linear hash function is used to transform the D dimensional data point to a lower k dimensional data point in such a way that the distances in reduced dimension feature space is preserved. The hash bit k is obtained through the operation, h_k(x) = sign(y(x)) = sign(f(w_k^Tx +b)). The output h(x) is 0 or 1 bit.

In essence, we are classifying whether the data point x belongs to 0 or 1 class by creating random hyperplanes.

There are various choices of hash functions in feature hashing for dimensionality reduction : f = tanh() or simply random sampling to obtain the hyperplanes. Another choice is to use kernel functions when the data is not linearly separable. Such a hashing function /technique is implemented using Kernels and one popular choice is to use the Gaussian RBF as the kernel function.


  • Question : In paper 1, the Authors have used the Asymmetric Tent Map https://en.wikipedia.org/wiki/Tent_map which is piecewise linear on the unit interval as the transfer function. To me the formulation of hashing in this paper using Tent Map appears similar to the hash equation (1). How can I apply the piecewise linear function i.e., apply this map to create hyperplanes in order to do feature hashing?

    Or am I mixing the two concepts?

Sm1
  • 560
  • 2
  • 6
  • 24

1 Answers1

2

Feature hashing needs a hash function, so that it will be able to do ... its hashing trick!

It will hash every image, thus you want a hash function that will take an image (i.e. a feature/vector of D dimensions) and produce a single integer value.


Note: I think you are confused with the second single-bit output hash function ξ, w.r.t. to the fact that the result will be binary. ξ() is easy to grasp, once you get the flow of the initial approach, since this is just an optimization to decrease hash collisions.


Now let's take a look at how Tent_map_2 behaves:

enter image description here

which as you see, is a real-valued function.

It takes one number as input and gives one number as output. As a result a naive Tent Map cannot hash a vector (an image in our case). There are many ways to augment the Tent Map to handle a vector, with the simplest to be:

Implement a Tent Map function, e.g. tentMap(), that will implement the logic of Tent Map. You could treat the returned values of Tent Map that are less than 0.5 as 0 and the rest (i.e. >= 0.5) as 1, but you could do more complex things if you like.

Now for an image, you could do:

def tentMapVector(image):
    results = []
    for i in image:
        results.append(tentMap(i))
    # now 'results' contains D integers
    # it needs to be hashed to a single integer
    # you could treat 'results' as a string and
    # use one of the numerous hash function to hash that string
    return hashString(results)

and you are ready to go! tentMapVector() should replace hash() in Wikipedia's implementation. You may want to read its example, really helpful.


It doesn't seem like Feature requires a dimensionality reduction, it should work without a reduction, but when you are data leave in a high dimensional space, which is usually the case with images, you should try reducing the dimensions (for example fro 256 to 128).

You could, for example, use the PCA() from scikit-learn, or any other library.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • Thank you for your answer but many things are not clear. (1) Could you please elaborate your last sentence,"Since you want your features to map to 0 or 1, you could treat the values that are less than 0.5 as 0 and the rest (i.e. >= 0.5) as 1." Mathematically, if the input pattern of an image is x in R^D then how can I apply the Tent Map to map the features to 0 and 1 ? (2) The papers that you linked apply the Tent Map for cryptography whereas feature hashing is different -- involves 2 steps (a) dimension reduction (b) map the result of (a) to {0,1}. – Sm1 Sep 27 '16 at 16:18
  • I am having a hard time to picture / connect Tent Map for feature hashing. Shall really appreciate more insights from you. Thank you for your effort. – Sm1 Sep 27 '16 at 16:18
  • @Sm1 forgive me for taking too long to answer, but my girlfriend is dumping on me, and you know...Anyway! Glad for giving me a good feedback, you see I don't really know the answer by heart, but I am driving to discover it with you, that's why I upvoted your question. So here is what I know from my experience: An image is a D-dimensional vector, in R^D. We want to apply dimensionality reduction to that vector and then get the dim-reduced vector and feed it to the Tent map, right? However, I am not sure why we want the result of Tent map to be binary? I don't see that in the Feature Hash Wikipe – gsamaras Sep 27 '16 at 20:50
  • @Sm1 can you please read too the Wikipedia page I linked to? I mean that's the way I feel, but we can specialize to be a binary code in the end, if you want. Your sentence about the input of Tent Map seems 'wanted', from what I understand it accepts a number and outputs a number. Do you agree? I don;t understand your last sentence. – gsamaras Sep 27 '16 at 22:02
  • That's a different question @Sm1, I think that you are asking is how to use Tent Map for Feature Hashing, right? If so, let me know, so that I can update my answer! ;) – gsamaras Sep 27 '16 at 22:14
  • I thank you for posting a good question, answer updated! @Sm1 – gsamaras Sep 27 '16 at 22:53
  • Thanks for the update but I am trying to visualize what the Tent Map is actually doing. In general am I creating a chaotic image? What is the physical meaning of using it for feature hashing? – Sm1 Sep 28 '16 at 16:25
  • That's a good question @Sm1, but it's a different one. I don't know if Tent Map is the optimal hash function for Feature Hashing. – gsamaras Sep 28 '16 at 17:36
  • Thanks for your help. – Sm1 Sep 28 '16 at 17:46
  • You are welcome @Sm1, I can see that you accepted my answer, but not upvoted. I did try my best. :) However, I would suggest to un-accept the answer, so that we can give time to other people to help too, as long as the bounty is still active - i.e. I advice you to accept an answer when the bounty ends! ;) – gsamaras Sep 29 '16 at 18:11
  • Great @Sm1 let's see if someone else has anything to say. I also modified your question so that it bumps! ;) – gsamaras Oct 01 '16 at 18:42
  • Oh damn it @Sm1, I got half of the bounty because you didn't accepted the answer just before the termination of the bounty, maybe I wasn't clear enough! :/ – gsamaras Oct 04 '16 at 13:41
  • @Sm1 is there a reason for not accepting the answer *again*? I mean I already lost the bounty... – gsamaras Nov 07 '16 at 00:09