I have an Excel file with several lines. In each row there is a cell with a longer text description. From this cell I want to export all adjectives for each row into a new column.
I have created a code by using openPyXL to manipulate the excel file and SpaCy to analyse the text.
But with the code attached below, I don't get the desired result. Most of the rows remain empty. Often the articles are copied out, but not the adjectives. Or the complete sentence is copied. It might be worth mentioning that the data in the Excel file is written in German. But for this I have loaded the appropriate module in the code line 12.
import openpyxl
import spacy
# Path to the source file
source_file_path = r'C:\Users\USERNAME\Desktop\test.xlsx'
# Load the Excel file
workbook = openpyxl.load_workbook(source_file_path)
sheet = workbook.active # Assumption: The worksheet is the active sheet
# Load the SpaCy model for the German language
nlp = spacy.load('de_core_news_sm')
# Extract adjectives from a text
def extract_adjectives(text):
doc = nlp(text)
adjectives = [token.text for token in doc if token.pos_ == 'ADJ']
return ', '.join(adjectives)
# Iterate through the rows in column A and extract adjectives
for row in sheet.iter_rows(min_col=1, max_col=1, values_only=True):
if row[0]: # If the cell is not empty
adjectives = extract_adjectives(row[0])
sheet.cell(row=row[0].row, column=5, value=adjectives) # Column E corresponds to column 5
# Save the modified workbook
workbook.save(source_file_path)
print(f'Adjectives were copied to column E of the file {source_file_path}.')
Thanks in advance
I have tried the code shown in the question. Also I tried several text examples (easier sentences) in Excel.