16

I'm trying to extend a matching matching algorithm across a sequence. My matches are 20 units long and have 4 channels at each timepoint. I have built a model that encapsulates the matching, I just can't figure out how to use that in a sliding window to apply it across a longer sequence to find the matches within the sequence.

I have 2 (20, 4) input tensors (query and target) that I concatenate, add, flatten, and then apply a simple dense layer. I have data at this stage to train with 100K query, target pairs.

def sum_seqs(seqs):
    return K.sum(seqs, axis=3)

def pad_dims(seq):
    return K.expand_dims(seq, axis=3)

def pad_outshape(in_shape):
    return (in_shape[0], in_shape[1], in_shape[2], 1)


query = Input((20, 4))
query_pad = Lambda(pad_dims, output_shape=pad_outshape, name='gpad')(query)

target = Input((20,4))
target_pad = Lambda(pad_dims, output_shape=pad_outshape)(target)

matching = Concatenate(axis = 3)([query_pad, target_pad])
matching = Lambda(sum_seqs)(matching)

matching = Flatten()(matching)
matching = Dropout(0.1)(matching)
matching = Dense(1, activation = 'sigmoid')(matching)

match_model = Model([query, target], matching)

This works perfectly. Now I want to use this pre-trained model to search a longer target sequence with varying query sequences.

It seems it should be something like:

long_target = Input((100, 4))

short_target = Input((20, 4))
choose_query = Input((20, 4))

spec_match = match_model([choose_query, short_target])

mdl = TimeDistributed(spec_match)(long_target)

But TimeDistributed takes a Layer not a Tensor. Is there a wrapper I'm missing? Am I going about this the wrong way? Do I need to reformulate this as a convolution problem somehow?

Continued experimentation: After a day of beating my head against the keyboard it is clear that both TimeDistributed and backend.rnn only allow you to apply a model/layer to a single time-slice of the data. It doesn't seem like there is a way to do this. It looks like the only thing that can "walk" across multiple slices of the time dimension is a Conv1D.

So, I reframed my problem as a convolution but that doesn't work well either. I was able to building a Conv1D filter that it would match a specific query. This worked reasonably well and it did allow me to scan longer sequences and get matches. BUT each filter is unique to each query tensor and there doesn't seem to be a way to go from a novel query to the appropriate filter weights without training a whole new Conv1D layer. Since my goal is to find new querys which match the most targets this doesn't help much.

Since my "matching" requires the interaction of the target AND the query at each window there doesn't seem to be a way I can get an interaction of a 20-length query tensor at each window across a 100-length target tensor through Conv1D.

Is there any way to do this sliding window type evaluation in Keras/tensorflow? It seems like something so simple yet so far away. Is there a way I can do this that I'm not finding?

Responses and further experimentation.

The solutions from @today and @nuric work but they end up replicating the input target data in a tiling type fashion. So, for a query of length m there will be a little under m copies of the input data in the graph. I was hoping to find a solution that would actually "slide" the evaluation across the target without the duplication.

Here's a version of the Conv1D almost solution I came up with.

query_weights = []

for query, (targets, scores) in query_target_gen():
    single_query_model = Sequential()
    single_query_model.add(Conv1D(1, 20, input_shape = (20, 4)))
    single_query_model.add(Flatten())

    single_query_model.fit(targets, scores)

    query_weights.append(single_query_model.layers[0].get_weights())

multi_query_model_long_targets = Sequential()
multi_query_model_long_targets.add(Conv1D(len(query_weights), 20, input_shape = (100, 4)))

multi_query_model_long_targets.layers[0].set_weights(combine_weights(query_weights))

multi_query_model_long_targets.summary()

The combine_weights function just does some unpacking and matrix rearrangement to stack the filters in the way Conv1D wants.

This solution fixes the data duplication issue but it screws me in other ways. One is data based ... my data contains many query, target pairs but it tends to be the same target many querys, since it is easier to generate the real-world data in that orientation. So, doing it this way makes the training difficult. Second, this assumes that each query works in an independent way, when in reality, I know that the query, target pairing is what is actually important. So it makes sense to use a model that can look at many examples of the pairs, and not individuals.

Is there a way to combine both methods? Is there a way to make it so Conv1D takes both the long target tensor combine it with the constant query as it walks along the sequence?

today
  • 32,602
  • 8
  • 95
  • 115
JudoWill
  • 4,741
  • 2
  • 36
  • 48
  • To make sure I understand your problem: assuming you have a target of length 100, you want to find out whether each of `target[0:20]`, `target[1:21]`, `target[2,22]`, ..., `target[-20:]` match with a `query` of length 20, using your trained model? And maybe each target may have a length of `k` where `k` is not necessarily 100? – today Jun 24 '18 at 16:51
  • @today. That is correct ... although it will be `target[0:20, :]`, `target[1:21, :]`, ... since the matching requires all 4 channels in the evaluation. I'm generally assuming that `k` will be the same for each batch. Ultimately I'll end up taking the maximum matching score for each target into the next layer. So different target lengths won't effect downstream layers. – JudoWill Jun 24 '18 at 17:32
  • Have you tried using [`tf.extract_image_patches()`](https://www.tensorflow.org/api_docs/python/tf/extract_image_patches)? It is basically what you are looking for. Let me know if you could not use it. – today Jun 24 '18 at 17:36
  • @today maybe although it looks like it would need some finagling. `tf.extract_image_patches()` requires a 4D tensor `[batch, in_rows, in_cols, depth]` where mine is a 2D. And its not clear how the tensors come out (I'm AFK, so can't test). If you can write it as an answer with some basic code I'd be happy to test it tonight and award the bounty. – JudoWill Jun 24 '18 at 18:13
  • Ok. I'll write an answer. One more question: Is it okay if the target be transformed from shape `(batch_size, 100, 4)` to `(batch_size, 81, 20, 4)` where 81 is exactly the number of sliding windows (i.e. patches)? Could you handle it or you want the shape to be`(batch_size*81, 20, 4)`? – today Jun 24 '18 at 18:17
  • Now that I think about your general problem (i.e. find matching score of a target sequence of arbitrary length to a fixed length query), I think it can be reformulated in a way that you don't need an explicit sliding window operation. But, I couldn't understand the issues you have mentioned when using `Conv1D` layer? Could you please post the code of your convolutional model as well? – today Jun 24 '18 at 19:44
  • Since I didn't know the `tf.exctract_image_patches()` I couldn't figure out a way to get the `query` to be applied at each window. So I could train a filter for `query1`, `query2`, etc and run them across the target. But given a new unseen query, it required training of a new filter for that specific query. And that takes time. I tried to train a model to determine the appropriate `Conv1D` weights based on a query, but that was a terrible mess. – JudoWill Jun 24 '18 at 20:10
  • I don't see why this has to be done by the model. I might be missing something, but why aren't we just pre-processing the longer sequences into smaller ones in NumPy and then predicting them against the same query? Your data is already in pairs as well so don't modify the model at all. – nuric Jun 24 '18 at 21:19
  • @nuric The short answer is that being able to feed everything through the model at once will make some downstream aspects easier. You're right that there is a way to make it work with preprocessing, numpy, loops, and generators. This type of thing comes up a lot in my work and understanding a general solution will probably come in handy later. And I couldn't find an answer ANYWHERE. An on-point Stackoverflow answer can help everyone. Lastly, I was annoyed that I just couldn't figure something that seemed so simple. – JudoWill Jun 24 '18 at 23:28

2 Answers2

13

Just to provide an alternative solution using Keras backend functions.

You can also generate sliding windows with K.arange and K.map_fn:

def sliding_windows(inputs):
    target, query = inputs
    target_length = K.shape(target)[1]  # variable-length sequence, shape is a TF tensor
    query_length = K.int_shape(query)[1]
    num_windows = target_length - query_length + 1  # number of windows is also variable

    # slice the target into consecutive windows
    start_indices = K.arange(num_windows)
    windows = K.map_fn(lambda t: target[:, t:(t + query_length), :],
                       start_indices,
                       dtype=K.floatx())

    # `windows` is a tensor of shape (num_windows, batch_size, query_length, ...)
    # so we need to change the batch axis back to axis 0
    windows = K.permute_dimensions(windows, (1, 0, 2, 3))

    # repeat query for `num_windows` times so that it could be merged with `windows` later
    query = K.expand_dims(query, 1)
    query = K.tile(query, [1, num_windows, 1, 1])

    # just a hack to force the dimensions 2 to be known (required by Flatten layer)
    windows = K.reshape(windows, shape=K.shape(query))
    return [windows, query]

To use it:

long_target = Input((None, 4))
choose_query = Input((20, 4))
windows, query = Lambda(sliding_windows)([long_target, choose_query])

Given your pretrained match_model, the problem with TimeDistributed is that it cannot wrap a Keras Model with multiple inputs.

However, since the logic matching target and query is implemented in the layers after Concatenate, you can collect these layers into a Model, and apply TimeDistributed to it:

submodel_input = Input((20, 4, 2))
x = submodel_input
for layer in match_model.layers[-4:]:  # the `Lambda(sum_seqs)` layer
    x = layer(x)
submodel = Model(submodel_input, x)

Now you just need to process and merge the outputs of sliding_windows in the same way as in match_model:

long_target = Input((None, 4))
choose_query = Input((20, 4))
windows, query = Lambda(sliding_windows)([long_target, choose_query])

windows_pad = Lambda(lambda x: K.expand_dims(x))(windows)
query_pad = Lambda(lambda x: K.expand_dims(x))(query)
merged = Concatenate()([windows_pad, query_pad])

match_scores = TimeDistributed(submodel)(merged)
max_score = GlobalMaxPooling1D()(match_scores)
model = Model([long_target, choose_query], max_score)

model can then be used in an end-to-end fashion for matching long targets.

You can also verify that the output of model is indeed the maximum of the matching scores by applying match_model to sliding windows:

target_arr = np.random.rand(32, 100, 4)
query_arr = np.random.rand(32, 20, 4)

match_model_scores = np.array([
    match_model.predict([target_arr[:, t:t + 20, :], query_arr])
    for t in range(81)
])
scores = model.predict([target_arr, query_arr])

print(np.allclose(scores, match_model_scores.max(axis=0)))
True
Yu-Yang
  • 14,539
  • 2
  • 55
  • 62
  • Great! That's a pure tensorflow/Keras solution! @JudoWill if you ask my opinion you should accept this answer and award it the bounty since it is better than mine and more complete (although, as you can see in this solution and as I mentioned before, there is no way around data replication; and trust me, it does more good than harm!) – today Jun 25 '18 at 04:12
10

Note: look at @Yu-Yang's solution. It is much better.


Well, as I mentioned in my comment, you can use tf.exctract_image_patches() (if the documentation seems a bit vague read this answer on SO) to extract patches (Edit: I just added two variables win_len and feat_len and changed 100 to None and 81 to -1 to make it work with the target sequences of arbitrary length):

import tensorflow as tf
from keras import layers, models
import keras.backend as K

win_len = 20   # window length
feat_len = 4   # features length

def extract_patches(data):
    data = K.expand_dims(data, axis=3)
    patches = tf.extract_image_patches(data, ksizes=[1, win_len, feat_len, 1], strides=[1, 1, 1, 1], rates=[1, 1, 1, 1], padding='VALID')
    return patches

target = layers.Input((None, feat_len))
patches = layers.Lambda(extract_patches)(target)
patches = layers.Reshape((-1, win_len, feat_len))(patches)

model = models.Model([target], [patches])
model.summary()
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, None, 4)           0         
_________________________________________________________________
lambda_2 (Lambda)            (None, None, None, 80)    0         
_________________________________________________________________
reshape_2 (Reshape)          (None, None, 20, 4)       0         
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________

For example, if input target has a shape of (100, 4), the output shape is (81, 20, 4).

Here is a test:

import numpy as np

# an array consisting of numbers 0 to 399 with shape (100, 4)
target = np.arange(1*100*4*1).reshape(1, 100, 4)
print(model.predict(a))

Here is the output:

[[[[  0.   1.   2.   3.]
   [  4.   5.   6.   7.]
   [  8.   9.  10.  11.]
   ...
   [ 68.  69.  70.  71.]
   [ 72.  73.  74.  75.]
   [ 76.  77.  78.  79.]]

  [[  4.   5.   6.   7.]
   [  8.   9.  10.  11.]
   [ 12.  13.  14.  15.]
   ...
   [ 72.  73.  74.  75.]
   [ 76.  77.  78.  79.]
   [ 80.  81.  82.  83.]]

  [[  8.   9.  10.  11.]
   [ 12.  13.  14.  15.]
   [ 16.  17.  18.  19.]
   ...
   [ 76.  77.  78.  79.]
   [ 80.  81.  82.  83.]
   [ 84.  85.  86.  87.]]

  ...

  [[312. 313. 314. 315.]
   [316. 317. 318. 319.]
   [320. 321. 322. 323.]
   ...
   [380. 381. 382. 383.]
   [384. 385. 386. 387.]
   [388. 389. 390. 391.]]

  [[316. 317. 318. 319.]
   [320. 321. 322. 323.]
   [324. 325. 326. 327.]
   ...
   [384. 385. 386. 387.]
   [388. 389. 390. 391.]
   [392. 393. 394. 395.]]

  [[320. 321. 322. 323.]
   [324. 325. 326. 327.]
   [328. 329. 330. 331.]
   ...
   [388. 389. 390. 391.]
   [392. 393. 394. 395.]
   [396. 397. 398. 399.]]]]
today
  • 32,602
  • 8
  • 95
  • 115
  • Based on the shapes this is exactly what I'm looking for. I'll try it out tonight and see if it works! – JudoWill Jun 24 '18 at 20:11
  • @JudoWill I hope it works... but after reading your question for the second time, I suspect you can easily use the output of above model without any modifications/post-processing; since as I understood you are looking for a **pure** Keras/tensorflow solution that could be packaged as **one single Keras model** such that for a given target sequence and a given query, finds the matching score of each sub-sequence of target with the given query using your pre-trained model (i.e. `match_model`). Anyway, test this solution and if it does not work, feel free to let me know to discuss alternatives. – today Jun 24 '18 at 20:31
  • and @nuric. Both of your answers work with what I'm looking to do but require replicating the `target` data `m` times for query lengths of `m`. The values at `target[20,:]` is replicated 20X (across the first window to the 20th window). I was looking for something that would be able to actually "walk" with window along without having to replicate the data. I'm adding my hackish `Conv1D` solution, maybe that'll spark an idea for how to do this with data replication. – JudoWill Jun 25 '18 at 00:21
  • @JudoWill I see. So, you want the solution to be purely implemented in Keras/tensorflow as I understand? Or is it okay if it uses numpy or python methods? Since obviously one solution is a python method that takes a target and query as inputs and slides over the target in a loop and apply your pre-trained model on it? This way it does not replicate data. And by the way, are there any memory restrictions since you are concerned with data replication? – today Jun 25 '18 at 00:30
  • @JudoWill "I was hoping to find a solution that would actually "slide" the evaluation across the target without the duplication." I am a little confused, actually. Apart from the fact that your convolutional solution may not be a good fit, you are accumulating the weight of `Conv` layer in a list and if the number of target and query pairs is high, then it would cost a lot of memory. Plus, you say that "I was hoping to find a solution that would actually *slide* the evaluation across the target without the duplication.": Be aware that this means hurting the performance of the model. >>>>>>> – today Jun 25 '18 at 01:03
  • 1
    >>>>> The good thing about data replication is the fact that it could exploit parallelism. Even the convolution operation in most of major deep learning libraries are implemented by extracting all the patches in data and then applying the kernel on all the patches simultaneously (e.g. in GPU). [(Actually all the patches are stored in a matrix)](http://caffe.berkeleyvision.org/tutorial/layers/im2col.html) – today Jun 25 '18 at 01:04
  • I agree, the `Conv1D` solution I posted was a terrible one, it was just the only way I could think of and I thought there might be a way to modify it for my purposes. I didn't know the under-the-hood implementation of `Conv1D` was a tiling, I figured it was some sort of progressive rolling method. I guess I'll just have to live with the duplication and modify other things to account for it. Thanks for all the help!! – JudoWill Jun 25 '18 at 01:11
  • @JudoWill I don't know the exact requirements of your problem and nature of your data so what I am going to say may need modifications or be wrong at all: If I were you, I would focus on the big picture ---> We have two sequences (possibly, of same nature e.g. text, genome, music freq, etc.) and we want to find their matching score (no matter what their length is). So one solution is that to encode both seqs using one or multiple LSTM or Conv layers and then find the score between two encoded seqs. This way you can build one single model that takes target and query pair and outputs a score. > – today Jun 25 '18 at 01:19
  • > It would be highly accurate (though, it depends on your data, the architecture you use, the hyper-parameters, etc.) if you tune it well and it is also very fast. If you are interested to discuss this approach, let me know. – today Jun 25 '18 at 01:22
  • @JudoWill However, If all of your training data has a fixed length (say 20) and not variable length, and also your pre-trained model (i.e. `match_model`) already has good accuracy, then I think the best solution would be feeding the extracted patches (i.e. using sliding window as in my answer or @nuric answer) to your pre-trained model. – today Jun 25 '18 at 01:42