-1

I'm trying to create a new data frame from existent data frames:

My new data frame (new_dataframe) will contain monthly purchased quantities of a specific year (2017) of two specific categories (c1 and c2). Knowing that, my other data frames are:

  • df1 = client_data (clientId, city, category)
  • df2 = purchase_data (clientId, quantity, price, purchase_date)

I've tried with substract() and aggregate() but it didn't work for me.
For instance to get data of the specific year (just a part of the solution), I used this code:

new_dataframe <- subset(
  AllInfosClients, 
  AllInfosClients$Date_achat == as.Date(AllInfosClients$Date_acha,"%d/%m/2017")
)

Any help will be appreciated.

wibeasley
  • 5,000
  • 3
  • 34
  • 62
Nabil
  • 19
  • 1
  • 2
    This could have many error sources. I think it would be helpful if you could post example data for your source dataframes. – Roman Oct 02 '18 at 21:30
  • this is how my two dataframes merged look like : – Nabil Oct 02 '18 at 21:38
  • Client Ville Category Qte Montant Date_achat 1 Cl1 Marseille S7 28 2750 16/05/2015 2 Cl1 Marseille S7 27 2570 03/06/2015 3 Cl3 Marseille S14 25 1240 21/11/2015 4 Cl3 Marseille S14 18 1560 21/10/2016 5 Cl3 Marseille S14 15 1460 30/11/2016 6 Cl5 Grenoble S15 30 1980 19/03/2016 7 Cl9 Marseille S10 22 2030 19/07/2015 – Nabil Oct 02 '18 at 21:40
  • 1
    @Nabil, check out [this advice](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1) about posting your data with `dput()`. Edit your post to include it, then delete the comment (because that unformatted data is just noise then). – wibeasley Oct 02 '18 at 21:47
  • but i think it looks more structred the way i copied it than with dput() function. – Nabil Oct 02 '18 at 22:05
  • It's not about how it looks. We can paste the output of your dput() inside our own R sessions and easily have an example of your data; then we can test your code and give you a good solution. For that, we also need the example of your two original data frames, not the merged one. – Carlos Eduardo Lagosta Oct 02 '18 at 22:32

1 Answers1

0

Here's a tidyverse solution.

I used dplyr::full_join() to merge df1 and df2, converted the dates to date format with lubridate and then used dplyr::filter() for year 2015 and categories S7 and S14:

library(dplyr)
library(lubridate)

# expected output from author's OP comment
new_dataframe <- read.table(text = "
  Client Ville Category Qte Montant Date_achat
1 Cl1 Marseille S7 28 2750 16/05/2015
2 Cl1 Marseille S7 27 2570 03/06/2015
3 Cl3 Marseille S14 25 1240 21/11/2015
4 Cl3 Marseille S14 18 1560 21/10/2016
5 Cl3 Marseille S14 15 1460 30/11/2016
6 Cl5 Grenoble S15 30 1980 19/03/2016
7 Cl9 Marseille S10 22 2030 19/07/2015",
                            header = T,
                            stringsAsFactors = F) %>%
  tbl_df()

# backwardly create df1 df2
df1 <- new_dataframe %>%
  select(Client, Ville, Category) %>%
  unique()

df2 <- new_dataframe %>%
  select(Client, Qte, Montant, Date_achat)

# join data frames
full_join(df1, df2, by = "Client")

# converts date to date format
new_dataframe$Date_achat <- dmy(new_dataframe$Date_achat)

# filtered data frame
df <- new_dataframe %>%
  filter(year(Date_achat) == 2015, (Category == "S7" | Category == "S14"))

# # A tibble: 3 x 6
#   Client Ville     Category   Qte Montant Date_achat
#   <chr>  <chr>     <chr>    <int>   <int> <date>    
# 1 Cl1    Marseille S7          28    2750 2015-05-16
# 2 Cl1    Marseille S7          27    2570 2015-06-03
# 3 Cl3    Marseille S14         25    1240 2015-11-21
Paul
  • 2,877
  • 1
  • 12
  • 28