I am working with a data set with ordinal variables as well as a column with text. In general, I would like to add columns that are results of a text mining exercise, maintaining the table structure.
For example, i have imported a CSV file data-subset.csv
and obtained a data frame called datacsv
datacsv=read.csv("data-subset.csv", header=TRUE,sep=";")
The third column tekst
contains text. I would like to search for numbers in that text (that will regularly lie between 0 and 1) in the context of "fte" and add these numbers as column fte
. See:
> luid titel tekst
>1 47300 docent wiskunde De Stichting Openbaar Voortgezet Onderwijs 0,65
fte voltijd niveau: havo vwo
>2 43701 docent natuurkunde Speciaal onderwijs fulltime 2015 2016 fte 0,77 Haarlem
>3 43702 assistent basisonderwijs Amsterdam fte 0,5
i have installed packages like tm
and quanteda
install.packages("tm", "quantada")
library ("tm")
library ("quanteda")
Without satisfying results, I have tried to use various kwic
statements, such as
datacsv ["fte"]<- kwic(datacsv$"tekst", "fte", 4)
Does anyone know how to mine the text column and add the results as a column (or multiple columns)?
Thanks!