How to build RNN with multimodal input to classify time series

Question

I have data of 50 samples per time series. I want to build a time series classifier.

Each sample has three inputs - a vector with the shape 1X768, a vector with the shape 1X25, a vector with the shape 1X496.

Each input is from different modality so need to go through some input-specific layers before concatenate all of them.

The data is store in the dataframe:

df = time_series_id timestamp    input1     input2     input3     time_series_label 
           0         0          [x0..x768] [x0..x25] [x0..x496]     A  
           0         1          [x0..x768] [x0..x25] [x0..x496]     A
     ..
           0         50         [x0..x768] [x0..x25] [x0..x496]     A  
           1         0          [x0..x768] [x0..x25] [x0..x496]     B
           1         50         [x0..x768] [x0..x25] [x0..x496]     B

I am new with DL and I want to build a network that classify each 50 timestamps-long time series to one of 2 classes, but I couldn't find any tutorial that exemplify how to insert multimodal data into Conv1d or LSTM layers.

How can I built such network, preferebly with keras, and train in on my dataframe in order to classify time series? (So, when I give it a new time series of 50 timestamps I will get A/B prediction for the entire time series)?

Please notice, the label is the same for all rows with the same id. So every time, I need to feed the RNN only with samples with the same id.

Niv Dudovitch · Accepted Answer · 2021-09-09T13:12:03.797

3

I have created nice example for you:

# Define mini-dataset  similar to yours example
df = pd.DataFrame({'A':[np.zeros((768))]*100,'B':[np.ones((25))]*100})
# 100 rows, 2 columns (each value in column A is a list size 768, each value in column B is a list size 25)

Preprocess the data to match rolling windows of 50 timestamps

# Create windows of data:
list_of_indexes=[]
df.index.to_series().rolling(50).apply((lambda x: list_of_indexes.append(x.tolist()) or 0), raw=False)
d_A = df.A.apply(list)
d_B = df.B.apply(list)
a = [[d_A[ix] for ix in x] for x in list_of_indexes]
b = [[d_B[ix] for ix in x] for x in list_of_indexes]
a = np.array(a)
b = np.array(b)

print(f'a shape: {a.shape}')
print(f'b shape: {b.shape}')

Data after preprocess:

a shape: (51, 50, 768)
b shape: (51, 50, 25)

Explanation:

a: 51 sample when each sample contains 50 timestamps and each timestamp contains 768 values. (b is the same with 25 values.)

Create a model with two inputs, input a and input b, you can process each of them separately and then concatenate.

# define two sets of inputs
input_A = Input(shape=(50, 768))
input_B = Input(shape=(50, 25))

LSTM_A = Bidirectional(LSTM(32))(input_A)
LSTM_B = Bidirectional(LSTM(32))(input_B)
               
combined = concatenate([
                        LSTM_A,
                        LSTM_B
                       ])
dense1 = Dense(32, activation='relu')(combined)
output = Dense(1, activation='sigmoid')(dense1)
model = Model(inputs=[
                     input_A,
                     input_B
                     ], outputs=output)
model.summary()

Model Summary:

Fit the model:

adam = Adam(lr=0.00001)
model.compile(loss='binary_crossentropy', optimizer=adam)
history = model.fit([a,b], y, batch_size=2, epochs=2)

Of course You can do concatenate before the LSTM:

# define two sets of inputs
input_A = Input(shape=(50, 768))
input_B = Input(shape=(50, 25))

combined = concatenate([
                        input_A,
                        input_B
                       ])
LSTM_layer = Bidirectional(LSTM(32))(combined)
dense1 = Dense(32, activation='relu')(LSTM_layer)
output = Dense(1, activation='sigmoid')(dense1)
model = Model(inputs=[
                     input_A,
                     input_B
                     ], outputs=output)
model.summary()

EDIT:

The df:

Shape: (100, 4)

Preprocess code:

def split_into_inputs(group):
    x_data_inp1.append(group.input1)
    x_data_inp2.append(group.input2)
    # supposing time_series_id have the same label for all of its rows (thats what i understood from the question details)
    y_data.append(group.time_series_label.unique()[0])


x_data_inp1 = []
x_data_inp2 = []
y_data = []

df.groupby('time_series_id').apply(lambda group: split_into_inputs(group))
# convert list into array with np.float dtype to match the nn.
x_data_inp1 = np.array(x_data_inp1, dtype=np.float)
x_data_inp2 = np.array(x_data_inp2, dtype=np.float)

# Convert labels from chars into digits
from sklearn.preprocessing import LabelEncoder
# creating instance of labelencoder
labelencoder = LabelEncoder()
# Assigning numerical values. Convert 'A','B' into 0, 1
y_data = labelencoder.fit_transform(y_data)

x_data_inp1.shape, x_data_inp2.shape, y_data.shape

Output:

((2, 50, 768), (2, 50, 25), (2,))

After the preprocessing for our 100 samples, there are 2 sequences of 50 samples each according to the "time_series_id" column, and there are 2 labels, label A as 0 for the first sequence, and label B as 1 for the second sequence. Question: Each sequence of 50 samples has a different "time_series_id"?

Defining the mode:

# define two sets of inputs
input_A = Input(shape=(50, 768))
input_B = Input(shape=(50, 25))

LSTM_A = Bidirectional(LSTM(32))(input_A)
LSTM_B = Bidirectional(LSTM(32))(input_B)

combined = concatenate([
                        LSTM_A,
                        LSTM_B
                       ])
dense1 = Dense(32, activation='relu')(combined)
output = Dense(1, activation='sigmoid')(dense1)
model = Model(inputs=[
                     input_A,
                     input_B
                     ], outputs=output)
model.summary()

Fitting the model:

adam = Adam(lr=0.00001)
model.compile(loss='binary_crossentropy', optimizer=adam)
history = model.fit([x_data_inp1, x_data_inp2], y_data, batch_size=2, epochs=2)

edited Sep 09 '21 at 13:12

answered Sep 05 '21 at 14:14

Niv Dudovitch

1,614
7
15

Thanks! can you please explain the preprocessing/rolling steps? What is the purpose behind this? and Why you want to have 51 rows after it (and 100 before? – Cranjis Sep 06 '21 at 09:10
Yep I understand but why convert it to (51,50,X)? Why 51? And why this step is neccesary? – Cranjis Sep 06 '21 at 09:15
Sent the comment before finish writing, sorry. The 100 rows are just for the example. I have created a sliding window of the size you mentioned (50). A sliding window is a popular technique (Example https://stackoverflow.com/questions/8269916/what-is-sliding-window-algorithm-examples). – Niv Dudovitch Sep 06 '21 at 09:22
Continue of the prev comment: According to (51,50,X) - 50 is the samples per time series (The look back size) | 51 is the number of samples after creating the sliding window on the set with 100-time stamps (As I mentioned before, 100 is just for the example I don't know what is the size of your real data). For example, if we use a sliding window of 3 on this data: [1,2,3,4,5,6], We get [1,2,3], [2,3,4], [3,4,5], [4,5,6]. Shape:(4,3,X) – Niv Dudovitch Sep 06 '21 at 09:23
This step is necessary because you want to feed the model with 50 sample(rows) together each time instead of one sample each time, that's way the model can get the information from all of the last 50 samples instead just from one, and the RNN (LSTM) layer can extract information from the past to the present. – Niv Dudovitch Sep 06 '21 at 09:25
You got it? Have any question? – Niv Dudovitch Sep 06 '21 at 09:56
I think so, I am not sure why the LSTM is before concatenation? The LSTM should take into account the data from each modality before conducting the time-depended analysis so I think concat should come before LSTM? – Cranjis Sep 06 '21 at 11:04
Yes, you probably need to concatenate before the LSTM layer, sometimes when using data from different timestamps (one input of the last 50, and one input of 50 samples one year ago) you can process them in a different LSTM layer and then concatenate. But, in your case you are right, concatenate before the LSTM layer. – Niv Dudovitch Sep 06 '21 at 11:23
I added to my solution the option with the concatenation before the LSTM layer. – Niv Dudovitch Sep 06 '21 at 11:23
Does this solution answer your question? If you encounter any complications feel free to talk to me. – Niv Dudovitch Sep 09 '21 at 09:53
There actually is one major problem in that that in your proposed pre-processing, to my understanding, in a single segment of 50 time samples there could be samples from other time-serieses. For example, since your rolling doesn't account for the columns 'time_series_id', after 30 steps there will be 20 examples from time_series_id == 0 and 30 from time_series_id == 1. However, they have different label (see edited dataframe). How can you mitigate this? – Cranjis Sep 09 '21 at 11:50
Ok, if I understood your case correctly (The sliding window technique doesn't good for this case). Editing my solution according to what I understood from you, if something doesn't correct please give more details about your data and what is the mission. – Niv Dudovitch Sep 09 '21 at 13:01
Thanks, yes that what I meant, but for the line: np.array(x_data_inp1, dtype=np.float) I am getting the error {ValueError}setting an array element with a sequence. Any idea why? At each iteration, group.input1 is a Series with 50 elements. Each element is a list with the len of 768, so not sure why it isn't working – Cranjis Sep 09 '21 at 19:26
Ok don't worry, its a common error.. check the dtype of the columns and their shape.. Its the main reasons for this error – Niv Dudovitch Sep 09 '21 at 22:01
The dtype of the columns (both input1 and input2) are object, But I assume this is reasonable as each cell contains a list? The object itself (that in the 768 list and in the 25 list) are always floats. – Cranjis Sep 10 '21 at 06:12
Yes, it's reasonable. If the object itself is always of type float you probably don't need to convert them to floats (in my example they weren't from float type so I convert them because the model needs this type as input to work correctly). – Niv Dudovitch Sep 11 '21 at 11:35
Yes but if I change this line to x_data_inp1 = np.array(x_data_inp1, dtype=np.float) I get that x_data_inp1.shape is (52,96) (96 is number of samples per time_series_id) , where every cell is a list of 768 elements – Cranjis Sep 11 '21 at 18:06
The shape should have 3 dimensions. as you said (52,96,+768). Its seems like you didny changed anything in the line – Niv Dudovitch Sep 11 '21 at 18:17
A problem can be if not all of them have tge same number of time stamps (as you mentioned as 96) – Niv Dudovitch Sep 11 '21 at 18:18
Sorry, I meant x_data_inp1 = np.array(x_data_inp1, dtype=object) . And I checked, all are 96. Yes It should be (52, 96, 768), not sure why only 2D – Cranjis Sep 11 '21 at 18:54
did you try to reshape it to the desiered shape? – Niv Dudovitch Sep 11 '21 at 23:16
I get {ValueError}cannot reshape array of size 4992 into shape (52,96,768) – Cranjis Sep 12 '21 at 05:42
1

It eventually worked after changed to x_data_inp1.append(list(group.input1)), thanks! – Cranjis Sep 12 '21 at 06:33

score 0 · Answer 2 · answered Aug 31 '21 at 08:47

0

Use some networks (Linear, MLPs, etc.) to embed them to same dimension and you can use add, elementwisely multiply, bi(tri)linear or whatever you want to get these together into dimension-unified input for RNNs or CNNs. Or you can just concat each timestep, and it's one data per timestep, and it'll be fine for CNNs

answered Aug 31 '21 at 08:47

KazamiMichiru

103
7

Please see my edit - the classification is at the time-series level – Cranjis Aug 31 '21 at 08:58
Most simply, concat them, per timestep, and there you go. – KazamiMichiru Aug 31 '21 at 11:11
I want to run each modalities through some layers before concatenation. Input1 should first go through Dense(25) and input3 should go through Dense(30) so I will get a vector of 1X80 after concatenation and it should be the input of the Conv1D. I don't understand how I make the NN to "understand" that all 50 timestamps belongs to the same timeseries. Do you maybe have a code sample? – Cranjis Aug 31 '21 at 11:19
You can simply use a sequential model like RNN, and in your occasion a bidirectional one(like Bi-LSTM, Bi-GRU) seems to be better. In RNN case, a full 50 inputs is combined sequentially and considered as one input, shaped like [T, F], where T is time length and F is your feature size of each timestep – KazamiMichiru Aug 31 '21 at 11:52
Sorry I understand it theoretically but I just don't understand how I implement a network that can meet both demands: 1) All 50 timestamps are part of the same time-series 2) I have 3 inputs and each can go through different layers before concatenation. Maybe you have a code sample for that? – Cranjis Aug 31 '21 at 17:55

How to build RNN with multimodal input to classify time series

2 Answers2