Questions tagged [rweka]

RWeka - R package which makes Weka functions accessible

R is a free, open-source programming language and software environment for statistical computing, bioinformatics, and graphics. Weka is an open source machine learning library written in Java. RWeka is a package for R that provides an interface to most Weka functions so that they are accessible in R.

104 questions
6
votes
2 answers

Creating N-Grams with tm & RWeka - works with VCorpus but not Corpus

Following the many guides to creating biGrams using the 'tm' and 'RWeka' packages, I was getting frustrated that only 1-Grams were being returned in the tdm. Through much trial and error I discovered that proper function was achieved using 'VCorpus'…
Paul_J
  • 61
  • 1
  • 4
6
votes
1 answer

In R Studio I am getting Java Out of Memory (for RWeka)

Ok, this looks familiar from the Java world, where/how can I allow more memory for RWeka in RStudio. Error in .jcall("RWekaInterfaces", "[S", "tokenize", .jcast(tokenizer, : java.lang.OutOfMemoryError: GC overhead limit exceeded Not sure how R…
5
votes
1 answer

Document-term matrix in R - bigram tokenizer not working

I am trying to make 2 document-term matrices for a corpus, one with unigrams and one with bigrams. However, the bigram matrix is currently just identical to the unigram matrix, and I'm not sure why. The code: docs<-Corpus(DirSource("data",…
filaments
  • 197
  • 1
  • 15
5
votes
1 answer

Knitr providing different results than RStudio

I'm doing some initial text mining using 'tm' and 'RWeka' using Knitr for reproducibility. I'm trying to obtain a term-document matrix for a corpus based on two text files, and the process has different results when I run the code in RStudio and…
ines vidal
  • 113
  • 1
  • 6
4
votes
1 answer

Using RWeka M5P in RStudio yields java.lang.NoClassDefFoundError: no/uib/cipr/matrix/Matrix

I have an R Script which used to work fine where I use RWeka's M5P-algorithm. For reasons unknown to me, it stopped working properly and now I get Error in .jcall(o, "Ljava/lang/Class;", "getClass") : java.lang.NoClassDefFoundError:…
mondano
  • 827
  • 10
  • 29
4
votes
2 answers

Generating all word unigrams through trigrams in R

I am trying to generate a list of all unigrams through trigrams in R to, eventually, make a document-phrase matrix with columns including all single words, bigrams, and trigrams. I expected to find an easy package for this, and have not succeeded. …
miratrix
  • 191
  • 2
  • 12
3
votes
1 answer

RWeka filter ReplaceMissingValues not working

i'm currently trying some exploration with Weka from R using RWeka. I'm trying to replace some missing values (that i intentionally added) with the ReplaceMissingValues unsupervised filter, but when i apply it only a portion of the dataframe comes…
3
votes
1 answer

2-gram and 3-gram instead of 1-gram using RWeka

I am trying to extract 1-gram, 2-gram and 3-gram from the train corpus, using RWeka NGramTokenizer function. Unfortunately, getting only 1-grams. There is my code: train_corpus # clean-up cleanset1<- tm_map(train_corpus, tolower) cleanset2<-…
3
votes
0 answers

Rweka Error in model.frame.default(formula = class ~ ., data = rtrain) : object is not a matrix

I'm new in using Rweka and R. When I'm using KNN to train the data like writing the following codes. library(RWeka) trainfile='/poker-hand-training-true.arff' rtrain <- as.data.frame(read.arff(file=trainfile)) classifier <- IBk(class ~., data =…
Claire Liu
  • 31
  • 1
  • 2
3
votes
1 answer

Pruning rule based classification tree (PART algorithm)

I am using PART algorithm in R (via package RWeka) for multi-class classification. Target attribute is time bucket in which an invoice will be paid by customer (like 7-15 days, 15-30 days etc). I am using following code for fitting and predicting…
user3697157
  • 90
  • 1
  • 1
  • 7
3
votes
2 answers

R and tm package: create a term-document matrix with a dictionary of one or two words?

Purpose: I want to create a term-document matrix using a dictionary which has compound words, or bigrams, as some of the keywords. Web Search: Being new to text-mining and the tm package in R, I went to the web to figure out how to do this. …
b_ron_
  • 197
  • 1
  • 1
  • 10
2
votes
0 answers

In WEKA, J48, does setting the minNumObj to 1 make sense?

When I was looking for explanation for minNumObj in WEKA, I came across this "the minimum number of instances per leaf is better thought of as the minimum amount of data separation per branching" and in this sense, I was wondering if setting the…
Habtamu S
  • 21
  • 1
2
votes
1 answer

In R, how do I retrieve information from an XMeans output

I have a data frame, df, containing the x and y coordinates of a bunch of points. Here's an excerpt: > tail(df) x y 1495 0.627174 0.120215 1496 0.616036 0.123623 1497 0.620269 0.122713 1498 0.630231 0.110670 1499 0.611844…
2
votes
1 answer

How to find markov blanket for a node?

I want to do feature selection using markov blanket algorithm. I am wondering is there any API in java/weka or in python to find the markov blanket . Consider I have a dataset. The dataset has number of variables and one one target variable. I want…
Rashida Hasan
  • 149
  • 3
  • 13
2
votes
1 answer

R: Obtaining Single Term Frequencies instead of Bigrams

Here is the code I use to create bi-grams with frequency list: library(tm) library(RWeka) #data <- myData[,2] tdm.generate <- function(string, ng){ # tutorial on rweka - http://tm.r-forge.r-project.org/faq.html corpus <-…
J Gisid
  • 23
  • 2
1
2 3 4 5 6 7