-2

I have a dataframe column which has so many words. I would like to create another column that will extract from each row, every word in any row that is in the list of words I supplied.

This is the list of words I supplied

skills = ['Qaulitative research', 'wireframing', 'figma', 'frame x', miro, 'mockflow', 'User Persona',  coding, 'empathy', 'sketch',
'communication', 'problem solving']

def skill_list(data):
  for item in data:
    if item in data:
      return item

all_files['skills'] = all_files.apply( lambda x: skill_list(x['job_description'].split()),axis=1)

Here is my table (dataframe) The current dataset

I want is to be like this Expected dataset

  • Please update your post with raw data (not images) so your example is reproducible. – Corralien Jan 07 '23 at 23:41
  • Please don’t post images of code, data or Tracebacks. Copy and paste it as text then format it as code (select it and type `ctrl-k`). [Why should I not upload images of ... when asking a question?](https://meta.stackoverflow.com/questions/285551/why-should-i-not-upload-images-of-code-data-errors-when-asking-a-question). – wwii Jan 08 '23 at 00:49
  • [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – wwii Jan 08 '23 at 00:50

1 Answers1

0

First create a regex pattern with your skill words then use str.findall to extract skills:

pattern = fr"({'|'.join(skills)})"
df['Skills'] = df['Description'].str.findall(pattern).str.join(', ')

Output:

Name Description Skills
Company A User Persona Qauntitative research Qaulitative research wireframing User Persona, Qaulitative research, wireframing
Corralien
  • 109,409
  • 8
  • 28
  • 52