1

I have a dataset, which contains of approx. 50k records of job titles. For example:

  1. Python Developer
  2. Java Developer
  3. Accountant
  4. Salesman
  5. Developer
  6. Programmer

etc. (Job titles are in german, but for explanation it doesn't matter)

I want to cluster similar jobs and for this i want to use word embeddings of each job title. For this I have chosen spaCy library

nlp = spacy.load("de_core_news_lg")

Almost every single word has an embedding representation. For example: python, developer, etc. But if I define them together as phrase('Python Developer'), this it doesn't have embedding representation.

How should I create word embeddings of the combination of words (like "Python Developer")?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Daniel Yefimov
  • 860
  • 1
  • 10
  • 24

0 Answers0