Good evening everybody,
I'm stuck about the construction of the for loop, I don't have any problem, buit I'd like to understand how I can create dataframe "independents" (duplicite with some differences).
I wrote the code step by step (it works), but I think that, maybe, there is a way to compact the code with the for.
x
is my original data.frame
str(x)
Classes ‘data.table’ and 'data.frame': 13500 obs. of 6 variables:
$ a: int 1 56 1058 567 987 574 1001...
$ b: int 10 5 10 5 5 10 10 5 10 10 ...
$ c: int NA NA NA NA NA NA NA NA NA NA ...
$ d: int 0 0 0 0 0 0 0 0 0 0 ...
$ e: int 0 0 0 0 0 0 0 0 0 0 ...
$ f: int 22 22 22 22 22 22 22 22 22 22 ...
My first goal is to delete per every column the eventualy NA and "" elements. I do this by these codes of rows.
x_b<- x[!(!is.na(x$b) & x$b==""), ]
x_c<- x[!(!is.na(x$c) & x$c==""), ]
x_d<- x[!(!is.na(x$d) & x$d==""), ]
x_e<- x[!(!is.na(x$e) & x$e==""), ]
x_f<- x[!(!is.na(x$f) & x$f==""), ]
After this the second goal is to create per each new data.frame a id code that I create using the function paste0(x_b$a, x_b$f)
.
x_b$ID_1<-paste0(x_b$a, x_b$b)
x_c$ID_2<-paste0(x_c$a, x_c$c)
x_d$ID_3<-paste0(x_c$a, x_c$d)
x_e$ID_4<-paste0(x_c$a, x_c$e)
x_f$ID_5<-paste0(x_c$a, x_c$f)
I created this for loop to try to minimize the rows that I use, and to create a good code visualization.
z<-data.frame("a", "b","c","d","e","f")
zy<-data.frame("x_b", "x_c", "x_d", "x_e", "x_f")
for(i in z) {
for (j in zy ) {
target <- paste("_",i)
x[[i]]<-(!is.na(x[[i]]) & x[[i]]=="") #with this I able to create a column on the x data.frame,
#but if I put a new dataframe the for doesn't work
#the name, but I don't want this. I'd like to create a
#data.base per each transformation.
#at this point of the script, I should have a new
#different dataframe, as x_b, x_c, x_d, x_e, x_f but I
#don't know
#How to create them?
#If I have these data frame I will do this anther function
#in the for loop:
zy[[ID]]<-paste0(x_b$a, "_23X")
}
}
I'd like to have as output this:
str(x_b)
Classes ‘data.table’ and 'data.frame': 13500 obs. of 6 variables:
$ a: int 1 56 1058 567 987 574 1001...
$ b: int 10 5 10 5 5 10 10 5 10 10 ...
$ c: int NA NA NA NA NA NA NA NA NA NA ...
$ d: int 0 0 0 0 0 0 0 0 0 0 ...
$ e: int 0 0 0 0 0 0 0 0 0 0 ...
$ f: int 22 22 22 22 22 22 22 22 22 22 ...
$ ID: int 1_23X 56_23X 1058_23X 567_23X 987_23X 574_23X 1001_23X...
and so on.
I think that there is some important concept about the dataframe that I miss.
Where I wrong?
Thank you so much in advance for the support.