0

I have about 15 data frames that each look like the below example. Each data frame represents the number of faults a machine has had on a given day for the 12 months 2019-07-01 to 2020-06-30

Machine #1
#        date         n
# 1      2019-07-01   8
# 2      2019-07-02   6
# 3      2019-07-03   10
# 4      2019-07-05   1
# 5      2019-07-06   2
# 6      2019-07-08   6
# .      ..........   .
# n      2020-06-30   5

I would like to use this tool here at Flourish Studio to create a bar chart race, but need my data to be of the form;

  Machine Number    2019-07-01   2019-07-02   2019-07-03   2019-07-04   2019-07-05   ......   2020-06-30
# 1     1                8            6            10           0            1                     5
# 2     2                0            4            3            1            0                     3 
# 3     3                2            0            1            3            5                     6
# 4     4                3            1            0            0            9                     11
# .     .                .            .            .            .            .                     .
# 15    15               10           8            4            6            0                     1

So essentially I need to end up with a data frame of 15 rows and 366 columns (one for machine number and 365 for each day of the year)

One important thing to note is that if no faults are reported on a given day, no data is recorded, in other words there are no zeros in any of the data frames, they would need to be added where needed somehow.

I am an absolute beginner and it's taken me weeks just to wrangle the data into its current form, and at this point I'm not sure how to proceed. So any help would be greatly appreciated.

Thanks.

benny86
  • 29
  • 4
  • Base R `rbind` or `dplyr::bind_rows` the df's and then [reshape data from long to wide in R](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format). This seems like a duplicate, if it is say so and I will close it. – Rui Barradas Nov 13 '20 at 06:00
  • Are the df's in a list, in the `.GlobalEnv`, where? – Rui Barradas Nov 13 '20 at 06:02
  • @RuiBarradas Yes I have the 15 df's in the `Global.Env` – benny86 Nov 13 '20 at 06:29

1 Answers1

0

The problem in the question is solved with a sequence of steps, most of them already answered elsewhere in SO but here it goes.

# Get the data.frames into a list
machine_names <- ls(pattern = "^Machine")
machine_list <- mget(machine_names, envir = .GlobalEnv)

# Coerce the date column to class "Date"
# and add a small amount of noise to 'n'
set.seed(2020)
machine_list <- lapply(machine_list, function(x){
  x[["date"]] <- as.Date(x[["date"]])
  x[["n"]] <- x[["n"]] + sample(-2:10, length(x[["n"]]), TRUE)
  x
})

# Create a column with the data.frame number
machine_list <- lapply(seq_along(machine_list), function(i){
  cbind.data.frame(Machine_Number = i, machine_list[[i]])
})
# Put all data.frames in the same df
machines <- do.call(rbind, machine_list)

# Now reshape from long to wide format
library(dplyr)
library(tidyr)    

# Arrange data according do date so pivot_wider gives
# the correct column order.
machines <- machines %>% arrange(date)

machines %>%
  pivot_wider(
    id_cols = Machine_Number,
    names_from = date,
    values_from = n,
    values_fill = 0
  )

Data creation code

temp <- read.table(text = "
        date         n
 1      2019-07-01   8
 2      2019-07-02   6
 3      2019-07-03   10
 4      2019-07-05   1
 5      2019-07-06   2
 6      2019-07-08   6
", header = TRUE)

machine_list <- replicate(15, temp, simplify = FALSE)
names(machine_list) <- paste("Machine", seq_along(machine_list), sep = "_")
list2env(machine_list, envir = .GlobalEnv)
rm(machine_list)
Dan Kennedy
  • 380
  • 1
  • 9
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Wow thank you for the in depth reply. That looks very intimidating to me but I will work through it and let you know how I go. Thanks again. – benny86 Nov 13 '20 at 07:32