-2

I am working on data mining in R programming and I'm using RStudio. My dataset looks like this:

This is an example of a dataset of medical data

I've used 'yes' 'no' instead of any other disease name in some places just to check if it works for 'yes' or 'no'.

Here you can see that a patient has different diseases/diagnosis. I am trying to use association rule to display me the diseases that a person is suffering along with HTN. I've written the following code:

mytestdata <- read.csv("D:/Senior Thesis/Program/test.csv", header=T,
                       colClasses = "factor", sep = ",")


library(arules)

myrules <- apriori(mytestdata,
                   parameter = list(supp = 0.1, conf = 0.1, maxlen=10, minlen=2),
                   appearance = list(rhs=c("Disease.1=HTN")))

summary(myrules)
inspect(myrules)

But I'm not getting any disease name in the column lhs; you can see that in the following image:

The results shown

Please help me so that lhs shows the name of the disease associated with rhs which is Disease.1=HTN.

duckmayr
  • 16,303
  • 3
  • 35
  • 53
Israt
  • 1
  • 2
  • 1
    Welcome to StackOverflow! Please check out this post on [how to make a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) - a screenshot gives us a sense of what the data look like, but we can't easily copy it into R for problem-solving – Punintended Apr 10 '18 at 18:23
  • Thank you so much for your reply. I'm trying my best to reproduce it but I'm facing some problems. Can you please still help me? I really need to know and it would be really kind of you if you can help me identify the mistakes I'm making in my code. You don't have to do anything with the age and gender. You can skip these two rows but what I'm really concerned about are the columns Disease 1, Disease 2, Disease 3 and Disease 4. Thanks in advance. – Israt Apr 10 '18 at 18:43
  • If you could change your screenshot to a `dput(head(mytestdata))`, that would be a great first step – Punintended Apr 10 '18 at 18:46
  • @lsrat: The fact that Punintended needed to tell you to put in output from `dput` means that you did not actually go to the suggested SO link and read it thoroughly. I have little sympathy for people who fail to take advantage of useful advice and instead just beg for "kindness". At SO we expect that you will put in effort. – IRTFM Apr 10 '18 at 19:00
  • I read it. Actually I'm very new to this and I've gone through them but still getting so many errors. Maybe I'm doing it wrong. I did what Punintended told me but couldn't been able to create the tabular form of the data yet. – Israt Apr 10 '18 at 19:04

2 Answers2

0

Your code takes missing values (e.g. cell E4 in excel sheet) as a factor level. You could prevent this behaviour when you specify the NA value in read.csv function.

mytestdata <- read.csv("D:/Senior Thesis/Program/test.csv", header=T,
                   colClasses = "factor", sep = ",", na.strings = "")
Jakub Buček
  • 101
  • 5
0

It would, if you had more data. There is just 3 rows that satisfy your rhs!

Note that you do get Disease.2=yes.

But I assume you want to ignore order on the diseases...

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194