I am trying to cluster groups of ideas, each one as a reference. Each rows contain an idea, the csv looks like this:
library(tm)
setwd("/Users/Bif/Documents")
#read the data
data<-read.csv("ideas.csv", header=T, sep=";")
> data
Reference idea
1 FI-000786 AIRE DE DETENTE LES BEAUX JOURS ARRIVENT etc…
2 FI-000754 Tiroirs de rangement des véhicules les tiroirs etc…
3 FI-000740 EVITER LES PI Vaines sur sur les dossiers MOAR etc..
4 FI-000717 Glossaire de sigleset trigrammes ucf beaucoup etc…
5 FI-000705 Transport de l'escabeau Bruit et accès de etc…
6 FI-000669 economie de papier C.Q.P (avis de passage avec etc…
7 FI-000653 UTILISATION D 'UNE CAMERA D'INSPECTION etc..
8 FI-000649 faciliter les déclarations de SD par les agents etc…
9 FI-000639 Récup Embase téléreport sur coffret Des coffrets etc…
I'm quite new with R. I've been trying with the text-mining tm-package and I can analyze the terms frequencies of the second column via a DoumentTermMatrix, the problem is with this process I'm only able to analyse it as if it was a plain text, not as different groups of text that I could compare afterwards and tell which references are similar to each others.
I've seen there is this qpad package topic which might get close to what I am looking for (even though I can't make it to load the package, don't know why yet..) but I can't figure out how I would cluster each references (dates in the link example) together anyway.
I've been searching quite a lot in on the web, I feel stuck now...
Thank you a lot.