4

I'm trying to read a file which uses :: as the column seperator:

userID::MovieID::Rating::Timestamp
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275

Here is my code

tr = read.table("/home/user/ml-1m/ratings.dat",sep = ":"  )
print(tr)

    the result is :

   V1 V2   V3 V4 V5 V6        V7
1   2 NA  318 NA  5 NA 978298413
2   2 NA 1207 NA  4 NA 978298478
3   2 NA 1968 NA  2 NA 978298881
4   2 NA 3678 NA  3 NA 978299250
5   2 NA 1244 NA  3 NA 978299143
6   2 NA  356 NA  5 NA 978299686
7   2 NA 1245 NA  2 NA 978299200

I don't want the NA value.
But if I set sep="::" ,there is error invalid 'sep' value: must be one byte  How can I fixed this?

Roland
  • 127,288
  • 10
  • 191
  • 288
user2492364
  • 6,543
  • 22
  • 77
  • 147
  • Did you check the content of `tr`? Are they the expected values? –  Apr 20 '15 at 06:52
  • You're error is not *reproducible*. It's very hard to help you when we can't run your example. Please see [this page](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on the subject and modify accordingly. – Anders Ellern Bilgrau Apr 20 '15 at 07:05
  • Your example is less reproducible than before editing. –  Apr 20 '15 at 07:48

1 Answers1

12

The text file importing functions only support single characters as column separators. However, you can tell read.table to ignore columns for import with its colClasses parameter (see the help file):

read.table(text = "userID::MovieID::Rating::Timestamp
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275", 
           sep = ":", colClasses = c(NA, "NULL"),
           header = TRUE)

#  userID MovieID Rating Timestamp
#1      1    1193      5 978300760
#2      1     661      3 978302109
#3      1     914      3 978301968
#4      1    3408      4 978300275
Roland
  • 127,288
  • 10
  • 191
  • 288
  • Thanks,I still have a question,the raw data didn't have this header :````userID::MovieID::Rating::Timestamp```` ,so I want to use ````col.names=c('user','movie','rating','timestamp')```` in ````read.table```` But it seems like I need to use ````````col.names=c('user','NA','movie','NA','rating','NA','timestamp')```` for the NA value .How can I solve this ? – user2492364 Apr 20 '15 at 08:17
  • 2
    I'd set the column names after import. – Roland Apr 20 '15 at 08:18
  • I mean the ratings.dat is just numbers ````1::1193::5::978300760 1::661::3::978302109 1::914::3::978301968 1::3408::4::978300275```` – user2492364 Apr 20 '15 at 08:24
  • 3
    I understood that. Just use `names(yourDF) <- c('user','movie','rating','timestamp')`. – Roland Apr 20 '15 at 08:26