0

I'm debugging a function that rotates through a series of groups in an input file, fixing any within group problems. In this snippet, I am only interested in fixing a specific factor level within a factor variable. Note that, for these subsets, the values in Sex, UsualResidents, and StatusID are the same. In the larger data frame, these differ as these variables are the group definitions.

I've encountered the error:

Error in x[[jj]] <- v : 
attempt to select more than one element in integerOneIndex

on the 44th time it's been used. I have narrowed down the problem to this line, where the stfixval is passed into the function as a string. In this case, the string is Partnered. I have done this snippet so that hopefully it passes as it does in the function, I've checked in the function and cls(stfixval) is a character.

stfixval <- "Partnered" 

ProblemData[is.na(ProblemData$StatusID)] <- stfixval

This code snippet is adding in the factor label (level?) for any missing levels, due as a result of an earlier join. The fix has worked for any of the 43 previous subsets that had a factor level of NA. I found this solution earlier on this site to replace a missing factor value, and it's worked for prior groups with an NA value for this variable

In case there is something weird about this subset, rather than typing it into here, I have created a public repo on GitHub (rds file here) so you can get the actual data - it's a very small file, only 10 rows and 8 columns.

If it was failing on other groups I would understand the problem. But I can't work out why it fails on this one. I tested in the console, too, and I get the same error trying to work on that subset data frame directly.

Why is it failing on this subset and what do I need to change to fix the problem?

Update, the dput()

 structure(list(SingleAge = c(30, 34, 25, 32, 29, 33, 31, 28, 
 26, 27), Sex = c("Female", "Female", "Female", "Female", "Female", 
 "Female", "Female", "Female", "Female", "Female"), UsualResidents = structure(c(6L, 
 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("One Usual Resident", 
 "Two Usual Residents", "Three Usual Residents", "Four Usual Residents", 
 "Five Usual Residents", "Six Usual Residents", "Seven Usual Residents", 
 "Eight or More Usual Residents"), class = "factor"), Fits = c(0.825802885298389, 
 0.907626463702454, 0.563424742306557, 0.879067146224702, 0.788922087592366, 
 0.896242467639439, 0.855717792665186, 0.744666248145727, 0.632348764077823, 
 0.692617400687639), TotalinStatus = c(4L, 2L, 1L, 4L, 3L, 4L, 
 2L, 4L, 2L, 1L), StatusID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
 1L, 1L, 1L, NA), .Label = c("Partnered", "Non-partnered", "Not Elsewhere Included"
), class = "factor"), NuminDesStatus = c(2, 2, 1, 3, 3, 4, 2, 
 2, 2, 0), ExpectedCount = c(3, 2, 1, 4, 2, 4, 2, 3, 1, 1)), row.names = c(NA, 
 -10L), class = "data.frame")
Michelle
  • 1,281
  • 2
  • 16
  • 31
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Sep 04 '21 at 05:13
  • I linked to the file in my GitHub repo. – Michelle Sep 04 '21 at 05:21
  • Data should be included in the question and not linked externally (links break over time). If it's a small data set, you should easily be able to include a `dput()` of your data in the body of your question. – MrFlick Sep 04 '21 at 05:22
  • I don't know if the problem is something corrupted in the file. It has worked 43 times previously. – Michelle Sep 04 '21 at 05:23
  • 1
    Are you trying to change the value of one column of the data.frame? Perhaps you mean `ProblemData$StatusID[is.na(ProblemData$StatusID)] <- stfixval` or `ProblemData[is.na(ProblemData$StatusID), "StatusID] <- stfixval`. Or is `ProblemData` a data.table rather than a `data.frame`? The indexing just doesn't look right so I can't see how it would have worked for any input. – MrFlick Sep 04 '21 at 05:26
  • It's a data frame. And just the one column. The problem is that the variable is a factor with 3 levels, only one of which is being used here. So it has to take the underlying factor number as well as the label thing. – Michelle Sep 04 '21 at 05:28
  • 1
    `df$StatusID[is.na(df$StatusID)] <- stfixval` works – Ronak Shah Sep 04 '21 at 05:34
  • Thanks, both. I now wonder why it worked the other times. I had an error on an earlier subgroup and implemented that as the solution - and it worked, no NA in that column. That's why I was stuck on how it failed this other time. – Michelle Sep 04 '21 at 05:36
  • Working perfectly, including for the earlier groups, now to debug the next problem! Thanks! – Michelle Sep 04 '21 at 06:15

0 Answers0