0

I want to build a Keras model according to

https://blogs.rstudio.com/tensorflow/posts/2018-01-24-keras-fraud-autoencoder/ with the following data:

TX_ID          SENDER_ACCOUNT_ID RECEIVER_ACCOUNT_ID     TX_TYPE           TX_AMOUNT          TIMESTAMP     
 Min.   :       1   Min.   :    0     Min.   :    0       TRANSFER:12476012   Min.   :       0   Min.   :  0.00  
 1st Qu.: 3119004   1st Qu.:25007     1st Qu.:23989                           1st Qu.:      21   1st Qu.: 49.00  
 Median : 6238006   Median :49936     Median :48825                           Median :     155   Median : 99.00  
 Mean   : 6238006   Mean   :49542     Mean   :49532                           Mean   :   22643   Mean   : 99.46  
 3rd Qu.: 9357009   3rd Qu.:75006     3rd Qu.:73955                           3rd Qu.:     448   3rd Qu.:149.00  
 Max.   :12476012   Max.   :99999     Max.   :99999                           Max.   :21474836   Max.   :199.00  
  IS_FRAUD           ALERT_ID       
 False:12458960   Min.   :  -1.000  
 True :   17052   1st Qu.:  -1.000  
                  Median :  -1.000  
                  Mean   :   1.894  
                  3rd Qu.:  -1.000  
                  Max.   :3999.000  
> 
> str(df)
'data.frame':   12476012 obs. of  8 variables:
 $ TX_ID              : int  1 2 3 4 5 6 7 8 9 10 ...
 $ SENDER_ACCOUNT_ID  : int  5942 86700 86700 86700 86700 86700 86700 86700 86700 86700 ...
 $ RECEIVER_ACCOUNT_ID: int  92982 43995 95516 83911 82801 10605 88864 25971 74981 42920 ...
 $ TX_TYPE            : Factor w/ 1 level "TRANSFER": 1 1 1 1 1 1 1 1 1 1 ...
 $ TX_AMOUNT          : num  517 198 198 198 198 ...
 $ TIMESTAMP          : int  0 0 0 0 0 0 0 0 0 0 ...
 $ IS_FRAUD           : Factor w/ 2 levels "False","True": 1 1 1 1 1 1 1 1 1 1 ...
 $ ALERT_ID           : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ..

I get an error while doing this part: Now let’s create normalized versions of our datasets. We also transformed our data frames to matrices since this is the format expected by Keras.

desc <- df_train %>% 
  select(-Class) %>% 
  get_desc()

x_train <- df_train %>%
  select(-Class) %>%
  normalization_minmax(desc) %>%
  as.matrix()

x_test <- df_test %>%
  select(-Class) %>%
  normalization_minmax(desc) %>%
  as.matrix()

I get the error:

Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, : ‘min’ not meaningful for factors

How can i solve this problem?

florisviss
  • 41
  • 6

1 Answers1

1

The functions of interests from teh post are those:

get_desc <- function(x) {
  map(x, ~list(
    min = min(.x),
    max = max(.x),
    mean = mean(.x),
    sd = sd(.x)
  ))
} 

#' Given a dataset and normalization constants it will create a min-max normalized
#' version of the dataset.
normalization_minmax <- function(x, desc) {
  map2_dfc(x, desc, ~(.x - .y$min)/(.y$max - .y$min))
}

Now you try to apply that to a data.frame which includes a factor. min and max are not defined for a factor for obvious reasons. Your factor (TX_TYPE) is anyways a constant, so the easiest is to remove it from the data frame from the beginning. If you have any factors which you need to include in your model, you have to encode them (i.e. translating them to a numerical value), cf. for instance one hot encoding.

df <- df %>% select(-TX_TYPE)
thothal
  • 16,690
  • 3
  • 36
  • 71
  • Thanks, when removing it from the data frame I still get the same error. Even when translated to numeric values. – florisviss Dec 06 '19 at 12:51
  • The error message says that you are using mi/max on a `factor`. So there must be a factor in your `data.frame`. My guess would be that there is still the response (`IS_FRAUD`) in your data.frame. For the next time, it would be much easier to help, if you provide a good [reprex](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – thothal Dec 06 '19 at 12:59