Questions tagged [multimodal]

32 questions
4
votes
0 answers

can't change embedding dimension to pass it through gpt2

I'm practicing image captioning and have some problems with different dimensions of tensors. So I have image embedding aka size [1, 512], but GPT2, which I use for caption generation, needs size [n, 768], where n is number of tokens of the caption's…
3
votes
0 answers

How to pass one data array per model input in multimodal deep autoencoder?

i'm working on a deep multimodal autoencoder for dimensionality reduction and i'm following this code (https://wizardforcel.gitbooks.io/deep-learning-keras-tensorflow/8.2%20Multi-Modal%20Networks.html) from keras.layers import Dense, Input from…
Andrea
  • 113
  • 1
  • 7
2
votes
1 answer

What method and tool for regression analysis for a multimodal distribution in R?

I have a set of variables X1 and X2 and Y with relationship plot as shown below. X2 values are used for color coding. X1, X2, and X3 are integer variables. The observed pattern is multimodal. What is the best way to predict Y based on X1 and…
vp_050
  • 583
  • 2
  • 4
  • 16
1
vote
0 answers

How to combine multiple images with one signal data in a dataset (Python/PyTorch/MultiModal)

I want to build a multimodal model, for every signal sequence i have several pictures. Example: For example i have 10 images that correspond to 5sec force data, which i want to combine in one batch. That means i want to build a model where those 10…
SunIsGod
  • 11
  • 2
1
vote
1 answer

get contrastive_logits_per_image with flava model using huggingface library

I have used a code of Flava model from this link: https://huggingface.co/docs/transformers/model_doc/flava#transformers.FlavaModel.forward.example But I am getting the following error: 'FlavaModelOutput' object has no attribute…
1
vote
1 answer

prediction logits using lxmert with hugging face library

how can we get the prediction logits in the lxmert model using hugging face library? It's fairly easy to get in visualbert, but I'm not able to get it with the lxmert model. In case of visualbert model, the keys I'm getting are…
1
vote
0 answers

Are there any alternatives to COVAREP in python?

I find that many multimodal sentiment analysis datasets(like CMU-MOSI) use the COVAREP to extract the audio features(74-dimensions). But i'm not familiar with Matlab. So, i wonder if there are some way for me to get the same features as COVAREP…
junyi chen
  • 11
  • 1
1
vote
0 answers

how can we apply masked language modelling on the images using multimodal models? How can we implement such a thing and get MLM scores?

It might not be clear from the question what I want to say, but how can we apply masked language modelling with the text and image given using multimodal models like lxmert. For example, if there is some text given (This is a MASK) and we mask some…
1
vote
0 answers

Layer "model" expects 2 input(s), but it received 1 input tensors

I built a vqa model, and set two inputs(images, questions). It was well trained with train/val datasets, but with test_dataset, it keep printing errors like below; ValueError: Layer "model" expects 2 input(s), but it received 1 input tensors. Inputs…
1
vote
0 answers

Detect multimodal distribution and split the data in R

I have a data with more than 10000 distributions looking like the ones in red. I want to compare each one of them with a reference distribution like the one in blue. Because some are unimodal and some are multimodal I cannot use a t-test for all of…
RCchelsie
  • 111
  • 6
1
vote
0 answers

Test differences in multimodal distributions for different groups in R or Python

I am analyzing data from 3 different gait speeds. For each group/speed, I am determining specific value called "angle". Each group has different sample size. So, I need to compare multimodal distributions and I would like to statistically test…
1
vote
2 answers

How to use the modal in the list in react native (a specific Modal for each list item)?

I made a customized list component (in React Native) which shows touchable images with some description texts. I need each images open a specific Modal; but I don't know how!! where & how I should code the Modal?? ... here is my photo list…
1
vote
1 answer

Plot unimodal distributions determined from a multimodal distribution

I've used GaussianMixture to analyze a multimodal distribution. From the GaussianMixture class I can access the means and covariances using the attributes means_ and covariances_. How can I use them to now plot the two underlying unimodal…
riyansh.legend
  • 117
  • 1
  • 13
1
vote
0 answers

How to implement three-way clustering in python

I am relatively a learner in the field of datascience. Recently I came across these concepts and I am really keen to implement them - i.e. the concept of multimodal clustering applications. (From here I got the idea -…
K C
  • 413
  • 4
  • 15
1
vote
0 answers

Can pre-trained ResNet50 be used for very low resolution image?

It's required to find the best image given with a text description. However, the resolution is very low, i.e., 50 x 50 pixels. In this case, can pre-trained ResNet50 be used? or any recommendations on a better architecture? Thanks!
HappyCoding
  • 5,029
  • 7
  • 31
  • 51
1
2 3