1

I have a database consisting of three tables like this:

database

I want to make a machine learning model in R using that database, and the data I need is like this:

Required data

I can use one hot encoding to convert categorical variable from t_pengolahan (such as "Pengupasan, Fermentasi, etc") into attributes. But, how to set flag (yes or no) to the data value based on "result (using SQL query)" data above?

2 Answers2

0

This seems unclear to me. What do you mean with "how to set flag (yes or no) to the data value based on "result (using SQL query)" data"? Do you want to convert one of the column to a boolean value? If so you need to specify the decision rule. This may look like this:

SELECT (... other columns),
CASE case_expression
     WHEN when_expression_1 THEN 'yes'
     WHEN when_expression_2 THEN 'no'
     ELSE '' 
END

To help others help you: - which SQL variant do you use? (e.g. would a sqlite solution work for you?) - provide an sql script of your table creation, plus a script to "use one hot encoding to convert categorical variable from t_pengolahan (such as "Pengupasan, Fermentasi, etc") into attributes"

mayeulk
  • 100
  • 8
0

We can combine two answers to previous related questions, each of which provides half of the solution; those answers are found here and here:

library(dplyr) ## dplyr and tidyr loaded for wrangling
library(tidyr)
options(dplyr.width = Inf) ## we want to show all columns of result
yes_fun <- function(x) { ## helps with pivot_wider() below
    if ( length(x) > 0 ) {
        return("yes")
    }
}
sql_result %>%
    separate_rows(pengolahan) %>% ## add rows for unique words in pengolahan
    pivot_wider(names_from = pengolahan, ## spread to yes/no indicators
                values_from = pengolahan,
                values_fill = list(pengolahan = "no"),
                values_fn = list(pengolahan = yes_fun))

Data

id_pangan  <- 1:3
kategori   <- c("Daging", "Buah", "Susu")
pengolahan <- c("Penggilingan, Perebusan", "Pengupasan",
                "Fermentasi, Sterilisasi")
batas      <- c(100, 50, 200)
sql_result <- data.frame(id_pangan, kategori, pengolahan, batas)

# A tibble: 3 x 8
  id_pangan kategori batas Penggilingan Perebusan Pengupasan
      <int> <fct>    <dbl> <chr>        <chr>     <chr>     
1         1 Daging     100 yes          yes       no        
2         2 Buah        50 no           no        yes       
3         3 Susu       200 no           no        no        
  Fermentasi Sterilisasi
  <chr>      <chr>      
1 no         no         
2 no         no         
3 yes        yes  
duckmayr
  • 16,303
  • 3
  • 35
  • 53