I have put lapply statements (postal codes coming out of 5 large text fields) in a function:
opm_naar_postc=function(kolom1,kolom2,kolom3,kolom4,kolom5) {
postc=lapply(kolom1, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][' '][a-zA-Z][a-zA-Z](\\D))", x)))[1])
postc1=lapply(kolom1, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][a-zA-Z][a-zA-Z](\\D))", x)))[1])
postc2=lapply(kolom2, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][' '][a-zA-Z][a-zA-Z](\\D))", x)))[1])
postc3=lapply(kolom2, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][a-zA-Z][a-zA-Z](\\D))", x)))[1])
postc4=lapply(kolom3, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][' '][a-zA-Z][a-zA-Z](\\D))", x)))[1])
postc5=lapply(kolom3, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][a-zA-Z][a-zA-Z](\\D))", x)))[1])
postc6=lapply(kolom4, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][' '][a-zA-Z][a-zA-Z](\\D))", x)))[1])
postc7=lapply(kolom4, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][a-zA-Z][a-zA-Z](\\D))", x)))[1])
postc8=lapply(kolom5, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][' '][a-zA-Z][a-zA-Z](\\D))", x)))[1])
postc9=lapply(kolom5, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][a-zA-Z][a-zA-Z](\\D))", x)))[1])
Then I want to remove any spaces, dots, NAs etc out of postc to postc9
postcodes=c("postc","postc1","postc2","postc3","postc4","postc5","postc6","postc7","postc8","postc9")
for (i in postcodes) {
i=gsub(" ","",i)
i=gsub("NA|[[:punct:]]","",i) }
Eventually, I paste all the postc to postc9 together, so one variable is left. this variable is my return variable. So I call the function like this:
df = df %>% mutate(postcode=opm_naar_postc(var1,var2,var3,var4,var5))
First of all, the for loop doesn't work (no error, but it doesn't do anything). It does work when I dont use a for loop. Second of all, I want to put all the 10 apply rules in one for loop, is that possible? I've tried a lot of things, but it doesn't seem to work...
Who can help me?
Thanks!
An example of my dataframe df:
var1 var2 var3 var4 var5
blablaehdhde blablatext blabla 1983 rf blablatext blablatext
1982 rf blabla text blala blablbal blaakakk text hahahahah
blblatext textte8743GH sdkhflksfjf kjsnhblabla gagagagag
Expected outcome:
postcode
1983rf
1982rf
8743GH