0

I'm trying do some NLP on stack overflow posts to predict tags according to what's in the title.

I have a constraint which is I have to embed my sentences using the framework of Sentence transformers

The idea was to embed the sentences and use them as an input to a neural network I built.

I'm not an expert in neural network so there are probably a lot of things that I'm missing

The problem I encounter is it failed to convert to a tensor. I have tried solving this with this post on SO , but still have the same issue...

Below is my code :

enter image description here

title_list = df.Title.tolist()

model = SentenceTransformer('paraphrase-distilroberta-base-v1')
embeddings = model.encode(title_list)
embeddings_list = [elem for elem in embeddings_ex]

enter image description here

df_embed = df 
df_embed['Embeddings'] = embeddings_list
df_embed.Embeddings = [np.asarray(x).astype('float32') for x in df_embed.Embeddings]

X = df_embed['Embeddings'].values
y = df_embed.Tags

mlb = MultiLabelBinarizer(classes=top_tags)
y_mlb = pd.DataFrame(mlb.fit_transform(y),columns=mlb.classes_, index=y.index)

from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(X, y_mlb, test_size = 0.3, random_state = 0)
X_val, X_test, y_val, y_test = train_test_split(X_val, y_val, test_size = 0.4, random_state = 0)

model = Sequential()
# Input - Layer
model.add(Dense(100, activation = "relu"))
# Hidden - Layers
model.add(Dropout(0.3, noise_shape=None, seed=None))
# Output- Layer
model.add(Dense(50, activation = "sigmoid"))


model.compile(loss='binary_crossentropy',
              optimizer=Adam(0.01),
              metrics=['accuracy'])

hist = model.fit(X_train, y_train, batch_size=8, epochs=10, validation_split=0.1)

I got this error:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).
  • Your code is not reproducible. What is `title_list`? What is `df_embed`? See how to create a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) – o-90 Mar 09 '21 at 15:36
  • can you put generated random y to confirm if this issue comes from y_mlb or not – Yefet Mar 09 '21 at 15:49
  • Not sure I understood. A generated y after the mlb ? – Célia Bayet Mar 09 '21 at 16:04
  • So instead of using y_mlb which is MultiLabelBinarizer just put any random list of label as labels to confirm if this issue come from y_mlb variable or not – Yefet Mar 09 '21 at 20:41
  • No I have the same issue so it's not from the y_mlb :( – Célia Bayet Mar 10 '21 at 10:43

0 Answers0