0

When i imported the following data saved as an UTF-8 Encoded Txt file

1   test1
1   test2
2   test1
2   test3

Into R-Studio I had issues with the BOM characters "" showing up in resulting table. Below is the code that I used to import the data.

library(arules)
library(arulesViz)

txn <- read.transactions("r-test.txt",rm.duplicates= FALSE,format="single",sep="\t",cols = c(1,2))
inspect(txn)

The resulting import looked like the following:

  items         transactionID
1 {test2}       1            
2 {test1,test3} 2            
3 {test1}       1 
TsTeaTime
  • 881
  • 1
  • 13
  • 34

2 Answers2

0

What I found was that by saving the file as a ANSI encoded txt file this cleared the issue up.

  items         transactionID
1 {test1,test2} 1            
2 {test1,test3} 2  

You can use the following r studio code to convert your file to ANSI format:

writeLines(iconv(readLines("Old File Name"), from = "UTF8", to = "ANSI_X3.4-1986"), 
           file("New File Name", encoding="ANSI_X3.4-1986"))

Hope this helps someone else if they have the same issue.

TsTeaTime
  • 881
  • 1
  • 13
  • 34
0

read.transactions also has an encoding argument. Try to set it to "UTF8"

read.transactions(file, format = c("basket", "single"), sep = "",
              cols = NULL, rm.duplicates = FALSE, 
              quote = "\"'", skip = 0, 
              encoding = "unknown")
Michael Hahsler
  • 2,965
  • 1
  • 12
  • 16
  • Hi Michael, I actually tried using the encoding set to UTF8 as well as UTF8 with BOM. However, both of these did not correct the issue. Thank you for the answer and let me know if it the encoding is working for you. – TsTeaTime Mar 26 '16 at 23:30
  • 1
    Looks like I need to add encoding also to scan in read.transactions. I will try and do that in the development version of arules on github. Not quite sure if that solves the problem. – Michael Hahsler Mar 28 '16 at 23:41
  • Thanks, That should work perfect. I'll give it a try once its added. – TsTeaTime Mar 29 '16 at 01:30