I have a dataset, which contains of approx. 50k records of job titles. For example:
- Python Developer
- Java Developer
- Accountant
- Salesman
- Developer
- Programmer
etc. (Job titles are in german, but for explanation it doesn't matter)
I want to cluster similar jobs and for this i want to use word embeddings of each job title. For this I have chosen spaCy library
nlp = spacy.load("de_core_news_lg")
Almost every single word has an embedding representation. For example: python, developer, etc. But if I define them together as phrase('Python Developer'), this it doesn't have embedding representation.
How should I create word embeddings of the combination of words (like "Python Developer")?