0

Good evening everybody,

I'm stuck about the construction of the for loop, I don't have any problem, buit I'd like to understand how I can create dataframe "independents" (duplicite with some differences).

I wrote the code step by step (it works), but I think that, maybe, there is a way to compact the code with the for.

x is my original data.frame

str(x)
Classes ‘data.table’ and 'data.frame':  13500 obs. of  6 variables:
 $ a: int  1 56 1058 567 987 574 1001...
 $ b: int  10 5 10 5 5 10 10 5 10 10 ...
 $ c: int  NA NA NA NA NA NA NA NA NA NA ...
 $ d: int  0 0 0 0 0 0 0 0 0 0 ...
 $ e: int  0 0 0 0 0 0 0 0 0 0 ...
 $ f: int  22 22 22 22 22 22 22 22 22 22 ...

My first goal is to delete per every column the eventualy NA and "" elements. I do this by these codes of rows.

x_b<- x[!(!is.na(x$b) & x$b==""), ]
x_c<- x[!(!is.na(x$c) & x$c==""), ]
x_d<- x[!(!is.na(x$d) & x$d==""), ]
x_e<- x[!(!is.na(x$e) & x$e==""), ]
x_f<- x[!(!is.na(x$f) & x$f==""), ]

After this the second goal is to create per each new data.frame a id code that I create using the function paste0(x_b$a, x_b$f).

x_b$ID_1<-paste0(x_b$a, x_b$b)
x_c$ID_2<-paste0(x_c$a, x_c$c)
x_d$ID_3<-paste0(x_c$a, x_c$d)
x_e$ID_4<-paste0(x_c$a, x_c$e)
x_f$ID_5<-paste0(x_c$a, x_c$f)

I created this for loop to try to minimize the rows that I use, and to create a good code visualization.

z<-data.frame("a", "b","c","d","e","f")
zy<-data.frame("x_b", "x_c", "x_d", "x_e", "x_f")


for(i in z) {
  for (j in zy ) {
    target <- paste("_",i)
    x[[i]]<-(!is.na(x[[i]]) & x[[i]]=="") #with this I able to create a column on the x data.frame, 
                                          #but if I put a new dataframe the for doesn't work
                                          #the name, but I don't want this. I'd like to create a 
                                          #data.base per each transformation.

                                          #at this point of the script, I should have a new 
                                          #different dataframe, as x_b, x_c, x_d, x_e, x_f but I 
                                          #don't know

                                          #How to create them?

                                          #If I have these data frame I will do this anther function 
                                          #in the for loop:
    zy[[ID]]<-paste0(x_b$a, "_23X")
   }
}

I'd like to have as output this:

str(x_b)
    Classes ‘data.table’ and 'data.frame':  13500 obs. of  6 variables:
     $ a: int  1 56 1058 567 987 574 1001...
     $ b: int  10 5 10 5 5 10 10 5 10 10 ...
     $ c: int  NA NA NA NA NA NA NA NA NA NA ...
     $ d: int  0 0 0 0 0 0 0 0 0 0 ...
     $ e: int  0 0 0 0 0 0 0 0 0 0 ...
     $ f: int  22 22 22 22 22 22 22 22 22 22 ...
     $ ID: int  1_23X 56_23X 1058_23X 567_23X 987_23X 574_23X 1001_23X...

and so on.

I think that there is some important concept about the dataframe that I miss.

Where I wrong?

Thank you so much in advance for the support.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Earl Mascetti
  • 1,278
  • 3
  • 16
  • 31

1 Answers1

0

There is simple way to do this with the tidyverse package(s):

First goal:

drop.na(df)

You can also use na_if if you want convert "" to NA.

Second goal: use mutate to create a new variable:

df <- df %>% 
 mutate(id = paste0(x_b$a, "_23X"))
novica
  • 655
  • 4
  • 11
  • Thank you so much for your answer @novica. You could be, please, more precise about the "second goal". – Earl Mascetti Dec 11 '19 at 08:38
  • Hey @Befrancesco I just noticed that I had some typos in my response. Fixed that. Can you tell me what do you need to be more precise? – novica Dec 11 '19 at 15:38
  • Hello @novica I'd like to understand how I can put your code in a for loop. Thank you in advance for your time :) – Earl Mascetti Dec 11 '19 at 15:47
  • Ah, @Befrancesco no need to put it in a for loop. `mutate` will go through each row and create a new variable. See some examples here: https://dplyr.tidyverse.org/reference/mutate.html – novica Dec 11 '19 at 18:02
  • My question was about the creation of independent and new data.frames in a for loop and not a variables of existing dataframe. If you could make an example, please, because in this way I not understand you, sorry. @novica – Earl Mascetti Dec 11 '19 at 19:47
  • I don't quite understand what do you want to achieve. How many data.frames do you need to have at the end? What is the difference among them? – novica Dec 11 '19 at 23:31
  • I'd like to calculate something in a for loop with differents variables from the same dataframe. Per every calculation (columns) I need to create differents dataframe. One dataframe that has the data and many dataframes as result. – Earl Mascetti Dec 12 '19 at 07:52
  • 1
    OK. I think this answer has what you need: https://stackoverflow.com/questions/33180753/create-multiple-data-frames-from-one-based-off-values-with-a-for-loop – novica Dec 12 '19 at 11:34
  • Thank you so much for your effort. I think that I'm going to solve my issue. If you want to complete your answer, I could give you a score up and accept it. – Earl Mascetti Dec 12 '19 at 13:53