0

I'm trying to create a dataframe by the use of function().

The situation: I have the following dataframe with the information: Code, Subcode & Description. Each code stands for an industry and the subcode for a specific industry inside.

    > head(industries.split)
  Code Subcode                                  Description
1   13      00                                AEROSPACE    
2   13      10    Engines, Components & Parts Manufacturers
3   13      20 Military & Commercial Aircraft Manufacturers
4   13      30        Missile & Missile Parts Manufacturers
5   13      40    Private & Business Aircraft Manufacturers
6   13      50                   Miscellaneous Aerospace   
> tail(industries.split)
    Code Subcode                                     Description
198   85      91                                 Wholesalers    
199   85      92                      Miscellaneous Companies   
200   86      00             REUTERS FUNDAMENTALS-SOURCED DATA  
201   86      10 Industrial/Commercial format; Industry group NA
202   86      20                   Utilities; Industry group NA 
203   86      30                  Bank format; Industry group NA

I want to combine the code with the subcode and exclude the subcode afterwards. For this, I wrote the following function, where name is a placeholder for the industry and code is the industry code.

Industry.Filter <- function(name, code){
  name <- industries.split %>%
    filter(Code == code)
  name[,1] <- paste(name[,1], name[,2],sep = "")
  name <- name[,-2]
}

The code works, but it doesn't store the value in a dataframe.

It only works when I store it seperately in a data frame:

aerospace <- Industry.Filter(aerospace, 13)

How I can use this function with out having aerospace <- in front of the function.

halfer
  • 19,824
  • 17
  • 99
  • 186
Yannik
  • 7
  • 2
  • 1
    There are ways to do it but they are not recommended. What is the problem in doing `aerospace <- Industry.Filter(aerospace, 13)` ? – Ronak Shah Jul 17 '20 at 12:16
  • There is no direct problem. I am just trying to figure out, if there is a way to include the store process into the function to save some time. – Yannik Jul 17 '20 at 12:19
  • https://stackoverflow.com/questions/9726705/assign-multiple-objects-to-globalenv-from-within-a-function – Ronak Shah Jul 17 '20 at 12:21
  • 1
    This won't save you time. It will cost you time in the long run if you try to implement writing a variable to the calling environment. Trust us, this is something you should avoid unless you really really know what you are doing. If you want to learn more about how it could be done, check out `?assign` in R – Allan Cameron Jul 17 '20 at 12:22
  • @AllanCameron Is it possible this is just a confused question about writing a function the OP can use in a pipe? If not, then I 100% agree with you. – Limey Jul 17 '20 at 12:37

1 Answers1

1

It can be done, but probably not with a function of the form you're using at the moment and even then not as easily as you might want it to be. The cause of the difficulty is dplyr's use of non-standard evaluation, or NSE.

You're using pipes (%>%). recall that by default the pipe means "use the object on the left hand side of the pipe as the first argument to the function on the right hand side of the pipe".

So if you had a function of the form

Industry.Filter <- function(data, code)

You could use it in a statement like

df <- df %>% Industry.Filter(code)

But then you'll run into problems with non standard evaluation if you want, for example, the name of the column on which you're filtering to be variable. But it's doable. Fortunately, that doesn't seem to be what you want here. Something like (untested code):

Industry.Filter <- function(data, code) {
 data %>%
   filter(Code == code) %>%
   mutate(Code = paste0(Code, Subcode) %>%
   select(-Subcode)
}

and then

aerospace <- df %>% Industry.Filter(13)

should give you what you want.

Limey
  • 10,234
  • 2
  • 12
  • 32