I am currently facing this problem. Analyzing a big data-set (roughly 3 million observations), I need to convert a variable from a format to another. Specifically, I had the date of incorporation of several firms, but coming in two formats: YYYY
or MM-DD-YYYY
, or other possibilities of which the last 4 characters were always relative to the year.
What I need is just the year so I developed this code:
library(stringi)
for (i in 1:length(amadeus$Dateofincorporation) {
if(nchar(amadeus$Dateofincorporation[i]) == 4 &
!is.na(amadeus$Dateofincorporation[i])) {
amadeus$Dateofincorporation[i] <- amadeus$Dateofincorporation[i]
}
else if (nchar(amadeus$Dateofincorporation[i]) != 4 &
!is.na(amadeus$Dateofincorporation[i])) {
amadeus$Dateofincorporation[i] <- stri_sub(amadeus$Dateofincorporation[i],-4,-1)
}
else {
amadeus$Dateofincorporation[i] <- amadeus$Dateofincorporation[i]
}
}
The code executes for a long time, and then returns the output:
Warning messages: 1: In doTryCatch(return(expr), name, parentenv, handler) : display list redraw incomplete 2: In doTryCatch(return(expr), name, parentenv, handler) : invalid graphics state 3: In doTryCatch(return(expr), name, parentenv, handler) : invalid graphics state 4: In doTryCatch(return(expr), name, parentenv, handler) : display list redraw incomplete 5: In doTryCatch(return(expr), name, parentenv, handler) : invalid graphics state 6: In doTryCatch(return(expr), name, parentenv, handler) : invalid graphics state
Does anyone have an idea on how to deal with this?
P.S. the vector is currently a character vector, do you think this has an impact?