0

Here is the data:

1:
30878
2647871
1283744
2488120
317050
1904905
1989766
14756
1027056
1149588
1394012
1406595
2529547
1682104
2625019
2603381
1774623
470861
712610
1772839
1059319
2380848
548064
10:
1952305
1531863
1000:
2326571
977808
1010534
1861759
79755
98259
1960212
97460
2623506
2409123
...

The number followed by ':' means it is a movieID, and then the following several lines are customerID, I want to write a loop to check whether the data contain ':', here is the code I tried:

for (i in 1:length(line)){
  #print(line[i])
  if(grep(':', line[i])==1 ){
    movieID<-as.integer(substr(line[i],1,nchar(line[i])-1)  )
    next
  } 
  else{
    customerID<-as.integer(line[i])
    #do something
  }
}

When I run this code, an error occurred, the error is: argument is of length zero I searched this error, then I changed the if statement:

if( !is.na(line[i]) && nchar(line[i])>1 && grep(':', line[i])==1 )

There is still an error: missing value where TRUE/FALSE needed

I can't solve it. This is the code I:

for (i in 1:27){
  #print(testData[i])
  if(grep(':', testData[i])==1 ){
    movieID<-as.integer(substr(testData[i],1,nchar(testData[i])-1)  )
    print(testData[i])
    next
  }else{
    customerID<-as.integer(testData[i])
    print(movieID)
    print(customerID)
 #print(subset.data.frame(mydata[[movieID]],mydata[[movieID]]$customerID==customerID) )
  }
}

Here is the out put and the error:

[1] "1:"
Error in if (grep(":", testData[i]) == 1) { : argument is of length zero

It looks like the error occur at else statement.

cloudiyang
  • 43
  • 7
  • Can you add print statements to try to see on which line your code is failing? The logic looks correct to me (and I tested each piece here locally). Maybe your file has some bad data somewhere. Perhaps it is failing because an EOF case? – Tim Biegeleisen May 05 '17 at 05:24
  • I have update the question, and I'm sure the data is correct. – cloudiyang May 05 '17 at 06:02

2 Answers2

1

the error is because grep returns logical(0) if the string you are looking for is not present. So your loop fails on i=2, as you can see when you look at the value of i when the loop breaks.

If you use grepl in stead, your loop works as planned (building on @Akarsh Jain s answer):

movieID<-array() 
customerID<-array()

for (i in 1:length(testData)){

  if(grepl(':', testData[i])){
    movieID[i]<-as.integer(substr(testData[i],1,nchar(testData[i])-1)  )
    next
  } else{
    customerID[i]<-as.integer(testData[i])

  }
}

ofcourse, the question is how useful this is. I assume you want to somehow split your data on movieID, which you can do easily using dplyr and tidyr:

library(dplyr)
library(tidyr)
#put your testData in a dataframe
testDf <- data.frame(customerID = testData)

newDf <- testDf %>% 
#identify rows with :
         mutate(movieID = ifelse(grepl(":",customerID), customerID, NA)) %>%
#fill all NA values in movieID with the previous non-NA value:         
         fill(movieID) %>%
#remove lines where customerID has a ":":
         filter(!grepl(":",customerID))

output:

    customerID movieID
1    30878       1
2  2647871       1
3  1283744       1

dummy data

testData <- read.table(text='1:
30878
                                 2647871
                                 1283744
                                 2488120
                                 317050
                                 1904905
                                 1989766
                                 14756
                                 1027056
                                 1149588
                                 1394012
                                 1406595
                                 2529547
                                 1682104
                                 2625019
                                 2603381
                                 1774623
                                 470861
                                 712610
                                 1772839
                                 1059319
                                 2380848
                                 548064
                                 10:
                                 1952305
                                 1531863
                                 1000:
                                 2326571
                                 977808
                                 1010534
                                 1861759
                                 79755
                                 98259
                                 1960212
                                 97460
                                 2623506
                                 2409123', stringsAsFactors=FALSE)[[1]]
Janna Maas
  • 1,124
  • 10
  • 15
0

Although line name won't effect but never use "line" as a name of object because it is a name of function in stats package of R.

The problem is you are assigning a new value each time to an object "movieID" or "customerID" not to their indexes as loop progress.

Every time "movieID" and "customerID" gets replaced by new value.

To assign value to array indexes you've to create an empty array first outside loop.

Please do replace "line" by any other object name.

movieID<-array() 
customerID<-array()

    for (i in 1:length(line)){
      #print(line[i])
      if(grep(':', line[i])==1 ){
        movieID[i]<-as.integer(substr(line[i],1,nchar(line[i])-1)  )
        next
      } 
      else{
        customerID[i]<-as.integer(line[i])
        #do something
      }
    }

Hope this might help @cloudiyang :)

Akarsh Jain
  • 930
  • 10
  • 15
  • It's sad, I have changed the object name and try add movieID<-array() customerID<-array(), but it didn't work. – cloudiyang May 05 '17 at 06:04