0

I am a beginner at R. Can someone assist me with how to get the following job done in R.

I have connected my R to redshift(AWS) db and i am performing certain operations on the redshift tables.

From the source table orders i have created a data frame which holds all the possible combinations listing ,how different orders can be placed.I have a id column which list the unique combination (it is jst row number ,since every row holds a unique combination)

data frame which holds the following values:

amt  order_time  order_day  hour_day table_no  item_grp     id
  2      1             2       14       16         1        1
  1      2             1       18        12        2        2

In total, the data frame contains 1500 row entries in it.(meant 1500 possible combinations)

I want this data frame to act as a lookup table for sql table name orders which holds the order_id

orders table

order_id amt order_time order_day hour_day table_no item_grp
123 2 1 2 14 16 1
321 2 1 2 14 16 1
456 1 2 1 18 12 2

How can I pass the values in my data frame to a sql statement in where condition Like read every row of my data frame and get the values which satisfy the desired condition from orders table and list the rows in the format speicified below

Output table would look like:

order_id amt order_time order_day hour_day table_no item_grp id 123 2 1 2 14 16 1 1 321 2 1 2 14 16 1 1 456 1 2 1 18 12 2 2

And so on...

ankitkhanduri
  • 315
  • 1
  • 2
  • 9
  • Where is order_id coming from? – Mike K. Apr 04 '17 at 14:42
  • @Mike K. thanks mike, for pointing it out Have updated the question . – ankitkhanduri Apr 04 '17 at 15:13
  • You want to merge/join these two tables. It seems like a duplicate of the R-FAQ [How to join (merge) two tables in R](http://stackoverflow.com/q/1299871/903061), except that you keep mentioning SQL. SQL also allows you to join tables, of course, but you haven't provided any details about connecting R to a SQL database, and if your question is about how to do that it is not nearly specific enough. – Gregor Thomas Apr 04 '17 at 15:18

1 Answers1

0

Here is one solution. It uses the left_join() method from the dplyr package for dataframe manipulation.

For details, read this from the dplyr documentation: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

library(dplyr)         # %>% , left_join()
library(purrr)         # map_df() to remove factors from structure()

#sample data
order_details <-
        dput(
                structure(
                        list(
                                order_id = structure(1:3, .Label = c("123", "321",
                                                                     "456"), class = "factor"),
                                amt = structure(c(2L, 2L, 1L), .Label = c("1",
                                                                          "2"), class = "factor"),
                                order_time = structure(c(1L, 1L, 2L), .Label = c("1",
                                                                                 "2"), class = "factor"),
                                order_day = structure(c(2L, 2L, 1L), .Label = c("1",
                                                                                "2"), class = "factor"),
                                hour_day = structure(c(1L, 1L, 2L), .Label = c("14",
                                                                               "18"), class = "factor"),
                                table_no = structure(c(2L, 2L, 1L), .Label = c("12",
                                                                               "16"), class = "factor"),
                                item_grp = structure(c(1L, 1L, 2L), .Label = c("1",
                                                                               "2"), class = "factor")
                        ),
                        .Names = c(
                                "order_id",
                                "amt",
                                "order_time",
                                "order_day",
                                "hour_day",
                                "table_no",
                                "item_grp"
                        ),
                        row.names = c(NA,
                                      -3L), class = "data.frame"))

order_details <- purrr::map_df(purrr::map_df(order_details, as.character), as.integer)

#sample data contd.
orders <-
        dput(structure(
                list(
                        amt = c(2L, 1L),
                        order_time = 1:2,
                        order_day = c(2L,
                                      1L),
                        hour_day = c(14L, 18L),
                        table_no = c(16L, 12L),
                        item_grp = 1:2,
                        id = 1:2
                ),
                .Names = c(
                        "amt",
                        "order_time",
                        "order_day",
                        "hour_day",

                        "table_no",
                        "item_grp",
                        "id"
                ),
                row.names = c(NA,-2L),
                class = "data.frame"
        ))

# lookup order id
orders_augm <- orders %>%
        left_join(
                order_details,
                by = c(
                        "amt",
                        "order_time",
                        "order_day",
                        "hour_day",
                        "table_no",
                        "item_grp"
                )
        )

Result:

orders_augm
# A tibble: 3 × 8
    amt order_time order_day hour_day table_no item_grp    id order_id
  <int>      <int>     <int>    <int>    <int>    <int> <int>    <int>
1     2          1         2       14       16        1     1      123
2     2          1         2       14       16        1     1      321
3     1          2         1       18       12        2     2      456

Reordered columns:

orders_augm %>% 
        select(order_id, amt, 
               order_time, order_day, hour_day, 
               table_no, item_grp,    id )

Result

# A tibble: 3 × 8
  order_id   amt order_time order_day hour_day table_no item_grp    id
     <int> <int>      <int>     <int>    <int>    <int>    <int> <int>
1      123     2          1         2       14       16        1     1
2      321     2          1         2       14       16        1     1
3      456     1          2         1       18       12        2     2
knb
  • 9,138
  • 4
  • 58
  • 85