Find the related items in r programming for data mining

Question

I am working on data mining in R programming and I'm using RStudio. My dataset looks like this:

I've used 'yes' 'no' instead of any other disease name in some places just to check if it works for 'yes' or 'no'.

Here you can see that a patient has different diseases/diagnosis. I am trying to use association rule to display me the diseases that a person is suffering along with HTN. I've written the following code:

mytestdata <- read.csv("D:/Senior Thesis/Program/test.csv", header=T,
                       colClasses = "factor", sep = ",")


library(arules)

myrules <- apriori(mytestdata,
                   parameter = list(supp = 0.1, conf = 0.1, maxlen=10, minlen=2),
                   appearance = list(rhs=c("Disease.1=HTN")))

summary(myrules)
inspect(myrules)

But I'm not getting any disease name in the column lhs; you can see that in the following image:

Please help me so that lhs shows the name of the disease associated with rhs which is Disease.1=HTN.

Welcome to StackOverflow! Please check out this post on [how to make a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) - a screenshot gives us a sense of what the data look like, but we can't easily copy it into R for problem-solving — Punintended, Apr 10 '18 at 18:23
Thank you so much for your reply. I'm trying my best to reproduce it but I'm facing some problems. Can you please still help me? I really need to know and it would be really kind of you if you can help me identify the mistakes I'm making in my code. You don't have to do anything with the age and gender. You can skip these two rows but what I'm really concerned about are the columns Disease 1, Disease 2, Disease 3 and Disease 4. Thanks in advance. — Israt, Apr 10 '18 at 18:43
If you could change your screenshot to a `dput(head(mytestdata))`, that would be a great first step — Punintended, Apr 10 '18 at 18:46
@lsrat: The fact that Punintended needed to tell you to put in output from `dput` means that you did not actually go to the suggested SO link and read it thoroughly. I have little sympathy for people who fail to take advantage of useful advice and instead just beg for "kindness". At SO we expect that you will put in effort. — IRTFM, Apr 10 '18 at 19:00
I read it. Actually I'm very new to this and I've gone through them but still getting so many errors. Maybe I'm doing it wrong. I did what Punintended told me but couldn't been able to create the tabular form of the data yet. — Israt, Apr 10 '18 at 19:04

score 0 · Answer 1 · answered Apr 10 '18 at 21:48

Your code takes missing values (e.g. cell E4 in excel sheet) as a factor level. You could prevent this behaviour when you specify the NA value in read.csv function.

mytestdata <- read.csv("D:/Senior Thesis/Program/test.csv", header=T,
                   colClasses = "factor", sep = ",", na.strings = "")

score 0 · Answer 2 · answered Apr 12 '18 at 20:53

0

It would, if you had more data. There is just 3 rows that satisfy your rhs!

Note that you do get Disease.2=yes.

But I assume you want to ignore order on the diseases...

answered Apr 12 '18 at 20:53

Has QUIT--Anony-Mousse

76,138
12
138
194

Find the related items in r programming for data mining

2 Answers2