Error: replacement has 1810947 rows and data has 1810956 (same each time) for pupil data

Question

I have a large dataframe in R (data) made up of 23 .gazedata files (one for each subject):

filenames <- list.files("~/Desktop/DUT Analyses 2019", pattern = "*.gazedata", full.names = TRUE)
ldf <- lapply(filenames, read_tsv)
data <- do.call("rbind", ldf)

After creating factors and timing variables, I create the pupil variable, based on default validity parameters collected by the eye-tracker:

data$DiameterPupilLeftEye[data$ValidityLeftEye != 0] <- NA
data$DiameterPupilRightEye[data$ValidityRightEye != 0] <- NA
data$pupil = rowMeans(select(data, DiameterPupilLeftEye, DiameterPupilRightEye), na.rm = TRUE)

Now, I need to create an interpolated pupil variable (pupil_inter) to interpolate values to a maximum gap of 4:

data$pupil_inter<- na.approx(data$pupil, rule = 2, maxgap = 4)

However, the following error occurs:

Error in `$<-.data.frame`(`*tmp*`, pupil_inter, value = c(4.2120165, 4.20966425,  : 


replacement has 1810947 rows, data has 1810956

These row amounts are exactly the same every time.

Crucially, if I exclude subjects 22 & 23 .gazedata files from the pre-processing, the latter code works and there is no error

I have tried identifying an existing "replacement has [x] rows, data has [y]" problem to help with my specific issue, but can't find a relevant solution. All .gazedata files were collected using the same hardware and software.

The error persists, even when successfully creating a null pupil_inter variable first, using the following code:

data$pupil_inter <- NA

Thanks in advance for any advice offered.

Just a guess, as I don't have access to your data... Does `data$pupil_inter<- na.approx(data$pupil, rule = 2, maxgap = 4, na.rm = FALSE)` work? — Stewart Macdonald, Mar 13 '19 at 10:26
That worked, thank you. Apologies it was so simple. I shall close the question, thanks again! — HairyBiscuits, Mar 13 '19 at 20:21
I'll add it as a proper answer with an explanation so you can mark this question as answered. — Stewart Macdonald, Mar 14 '19 at 02:50

score 2 · Accepted Answer · answered Mar 14 '19 at 02:58

The error is occurring because the na.approx() function is returning fewer rows than you're passing in, so you're trying to append a column that has 1,810,947 rows to a dataframe that has 1,810,956 rows. Having a look at the documentation for na.approx(), I see that there's an na.rm parameter. When that is true (which it is by default), it will remove NAs from the returned value and you will have fewer rows than you started with. If you set it to false, the NAs will be returned and you should have the same number of rows. Try this:

data$pupil_inter<- na.approx(data$pupil, rule = 2, maxgap = 4, na.rm = FALSE)

Error: replacement has 1810947 rows and data has 1810956 (same each time) for pupil data

1 Answers1