0

I am new in the Stackoverflow environment and i am trying to sort/create arrays so that i can work with these arrays. I am trying to get the mean and std.

I have a data set of almost 50.000 observations.

An example of the dataset is shown below.

| Person   | Product | Date       | Price |
|----------|---------|------------|-------|
| Chris    | Pear    | 01-02-2018 | 10    |
| Tom      | Pear    | 02-02-2018 | 11    |
| John     | Pear    | 03-02-2018 | 12    |
| Bill     | Pear    | 04-02-2018 | 13    |
| Someone  | Pear    | 05-02-2018 | 14    |
| Chris    | Pear    | 06-02-2018 | 15    |
| Tom2     | Apples  | 07-02-2018 | 16    |
| John     | Pear    | 08-02-2018 | 17    |
| Bill2    | Pear    | 09-02-2018 | 18    |
| Someone2 | Pear    | 10-02-2018 | 19    |



Mean price: 14.5
STD:3.028

What I want to have is an array (FOR each of the prices) so i would know what the mean price and std. were at the current date. That would give me only the most recent observation, based on the criteria: Person, Product

So I would end up with something like this (for Pears) at the date 10-02-2018:

+----------+---------+------------+-------+
| Person   | Product | Date       | Price |
+----------+---------+------------+-------+
| Tom      | Pear    | 02-02-2018 | 11    |
+----------+---------+------------+-------+
| Bill     | Pear    | 04-02-2018 | 13    |
+----------+---------+------------+-------+
| Someone  | Pear    | 05-02-2018 | 14    |
+----------+---------+------------+-------+
| Chris    | Pear    | 06-02-2018 | 15    |
+----------+---------+------------+-------+
| John     | Pear    | 08-02-2018 | 17    |
+----------+---------+------------+-------+
| Bill2    | Pear    | 09-02-2018 | 18    |
+----------+---------+------------+-------+
| Someone2 | Pear    | 10-02-2018 | 19    |
+----------+---------+------------+-------+

Mean price: 15.29
Std: 2.87

Hope that some is able to help out!

In advance many many thanks.

Mastofoz
  • 7
  • 3
  • So i am apparently also incredible bad at creating tables. I will try to find out how to edit so it shows it in a proper way. – Mastofoz Apr 07 '19 at 11:34
  • Take a look at the way that @Talha edited your post to see formatting for tables. – G5W Apr 07 '19 at 11:40
  • This is very easy to do in R. Can you provide a reproducible example? If you need help on how to do that, please see [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Roman Luštrik Apr 07 '19 at 11:43
  • Hi i will look into the reproducible example and come back to this, Thanks!! – Mastofoz Apr 07 '19 at 11:49
  • So is the below a reproducible example or? – Mastofoz Apr 09 '19 at 08:08

1 Answers1

0

Reproducing your data:

dat <- read.table(
  text = gsub(
    "[[:punct:]]", 
    "", 
    "| Person   | Product | Date       | Price |
     |----------|---------|------------|-------|
     | Chris    | Pear    | 01-02-2018 | 10    |
     | Tom      | Pear    | 02-02-2018 | 11    |
     | John     | Pear    | 03-02-2018 | 12    |
     | Bill     | Pear    | 04-02-2018 | 13    |
     | Someone  | Pear    | 05-02-2018 | 14    |
     | Chris    | Pear    | 06-02-2018 | 15    |
     | Tom2     | Apples  | 07-02-2018 | 16    |
     | John     | Pear    | 08-02-2018 | 17    |
     | Bill2    | Pear    | 09-02-2018 | 18    |
     | Someone2 | Pear    | 10-02-2018 | 19    |"
    ),
  header = T, colClasses = c(rep("character", 3), "integer")
) 

Cleaning the resulting table:

library(tidyverse)
library(magrittr)

dat %<>%
  mutate_if(is.character, funs(gsub("\\s+", "", .))) %>%
  mutate(Date = as.Date(Date, "%d%m%Y"))

Answering your question:

dat %>%                                    # replace %>% with %<>% to save changes
  group_by(Person, Product) %>%            # group by Person and Product  
  filter(Date == max(Date)) %>%            # Leave only most resent records
  ungroup() %>%                            # ungroup data
  arrange(Product, Date) %>%               # sort by Product, Date
  mutate(Date = format(Date, "%d-%m-%Y"))  # output date as in desired output

# A tibble: 8 x 4
  Person   Product Date       Price
  <chr>    <chr>   <chr>      <int>
1 Tom2     Apples  07-02-2018    16
2 Tom      Pear    02-02-2018    11
3 Bill     Pear    04-02-2018    13
4 Someone  Pear    05-02-2018    14
5 Chris    Pear    06-02-2018    15
6 John     Pear    08-02-2018    17
7 Bill2    Pear    09-02-2018    18
8 Someone2 Pear    10-02-2018    19
utubun
  • 4,400
  • 1
  • 14
  • 17
  • Hi @utubun So i think i was not completely clear in my question with all the rows in the dataset i need to calculate the mean and std. for each price. So to say i need to find all unique prices depended on the different criteria in the other columns. This should then be looking from the date of the price and three month back. – Mastofoz Apr 09 '19 at 06:19