2

I´m working with a complex matrix (complex to me...)

It is something like this:

      Invoice.1   Invoice.2   Invoice.3               mtime
1   21605000182 21605000183          NA 2017-01-16 19:51:33
2   21605000182 21605000183          NA 2017-01-16 19:51:33
3   21605000182 21605000183          NA 2017-01-16 19:51:33
4   21605000182 21605000183          NA 2017-01-16 19:51:33
5   21510000669 21602000125 21608000366 2017-01-20 13:28:36
6   21609000856          NA          NA 2017-01-20 13:28:36
7   21606000405 21608000354 21608000356 2017-01-20 13:28:36
8   21610000133          NA          NA 2017-01-20 13:28:36
9   21604000592 21605000604 21605000608 2017-01-20 13:28:36
10  21609001012          NA          NA 2017-01-20 13:28:36

I would like to convert all the Invoice columns to one, in order to clean up the "NA" and duplicated, but respecting the match of each one with the date of the last columns, which is the date of the claiming.

Something like that:

      Invoice          mtime
1   21605000182 2017-01-16 19:51:33
2   21605000182 2017-01-16 19:51:33
3   21605000182 2017-01-16 19:51:33
4   21605000182 2017-01-16 19:51:33
5   21510000669 2017-01-20 13:28:36
6   21609000856 2017-01-20 13:28:36
7   21606000405 2017-01-20 13:28:36
8   21610000133 2017-01-20 13:28:36
9   21604000592 2017-01-20 13:28:36
10  21609001012 2017-01-20 13:28:36
11  21605000183 2017-01-16 19:51:33
12  21605000183 2017-01-16 19:51:33
13  21605000183 2017-01-16 19:51:33
14  21605000183 2017-01-16 19:51:33
15  21602000125 2017-01-20 13:28:36
16  21608000354 2017-01-20 13:28:36
  • 1
    `unique(df)` will get rid of duplicate rows in your initial data.frame. Then you want to reshape long: https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format – lmo Jul 03 '17 at 13:02
  • 1
    as Imo already suggested use something like: `library(reshape2); melt(data)` or `library(tidyverse); data %>% gather(key, value, -mtime)` – Roman Jul 03 '17 at 13:03
  • 1
    I think you can use something with tidyr library, like data <- data %>% gather(tmp, Invoice, c(Invoice.1,Invoice.2,Invoice.3)) – CClaire Jul 03 '17 at 13:05
  • @lmo It is necessary to reshape the matrix, before apply unique. I. trying to apply data %>% gather...not solve right now – Álvaro Rodríguez Jul 03 '17 at 13:58
  • In your example, row 1 is a duplicate of row 2. In this instance, using `unique` initially will reduce the burden on the reshape algorithm and could potentially speed up the process significantly. You would have to use `unique` again after the reshape. There are many solutions in the link that I provided. – lmo Jul 03 '17 at 14:02

2 Answers2

0

Example using data.table: (should be faster then using other salutations)

DT <- data.table(Invoice.1 = 1:3, Invoice.2 = c(1L,4L,5L), mtime = 11:13)
DT

   Invoice.1 Invoice.2 mtime
1:         1         1    11
2:         2         4    12
3:         3         5    13

rez <- melt(DT, measure.vars = paste0("Invoice.", 1:2),
            value.name = "Invoice")
rez[, variable := NULL]
rez

   mtime Invoice
1:    11       1
2:    12       2
3:    13       3
4:    11       1
5:    12       4
6:    13       5

rez <- unique(rez)
rez

   mtime Invoice
1:    11       1
2:    12       2
3:    13       3
4:    12       4
5:    13       5
minem
  • 3,640
  • 2
  • 15
  • 29
0

Using the gather function for the tidyr package can do what you are looking for. gather will transform a data.frame from wide format to long format.

library(tidyr)
library(readr)

# Create a temp file to store the example data
data_file <- tempfile()

cat(
"Invoice.1,Invoice.2,Invoice.3,mtime
21605000182,21605000183,NA,2017-01-16 19:51:33
21605000182,21605000183,NA,2017-01-16 19:51:33
21605000182,21605000183,NA,2017-01-16 19:51:33
21605000182,21605000183,NA,2017-01-16 19:51:33
21510000669,21602000125,21608000366,2017-01-20 13:28:36
21609000856,NA,NA,2017-01-20 13:28:36
21606000405,21608000354,21608000356,2017-01-20 13:28:36
21610000133,NA,NA,2017-01-20 13:28:36
21604000592,21605000604,21605000608,2017-01-20 13:28:36
21609001012,NA,NA,2017-01-20 13:28:36",
file = data_file,
append = FALSE)

# Read the data from the temp file into a data.frame called `invoices`
invoices <-
  readr::read_csv(file = data_file, col_types = "cccT")

# View the data
invoices
# # A tibble: 10 x 4
#      Invoice.1   Invoice.2   Invoice.3               mtime
#          <chr>       <chr>       <chr>              <dttm>
#  1 21605000182 21605000183        <NA> 2017-01-16 19:51:33
#  2 21605000182 21605000183        <NA> 2017-01-16 19:51:33
#  3 21605000182 21605000183        <NA> 2017-01-16 19:51:33
#  4 21605000182 21605000183        <NA> 2017-01-16 19:51:33
#  5 21510000669 21602000125 21608000366 2017-01-20 13:28:36
#  6 21609000856        <NA>        <NA> 2017-01-20 13:28:36
#  7 21606000405 21608000354 21608000356 2017-01-20 13:28:36
#  8 21610000133        <NA>        <NA> 2017-01-20 13:28:36
#  9 21604000592 21605000604 21605000608 2017-01-20 13:28:36
# 10 21609001012        <NA>        <NA> 2017-01-20 13:28:36

# use the gather function from the tidyr package to transform the data from the
# wide format to a long format.

tidyr::gather(invoices, key = key, value = Invoice, -mtime, na.rm = TRUE) %>% print(n = Inf)
# # A tibble: 20 x 3
#                  mtime       key     Invoice
#  *              <dttm>     <chr>       <chr>
#  1 2017-01-16 19:51:33 Invoice.1 21605000182
#  2 2017-01-16 19:51:33 Invoice.1 21605000182
#  3 2017-01-16 19:51:33 Invoice.1 21605000182
#  4 2017-01-16 19:51:33 Invoice.1 21605000182
#  5 2017-01-20 13:28:36 Invoice.1 21510000669
#  6 2017-01-20 13:28:36 Invoice.1 21609000856
#  7 2017-01-20 13:28:36 Invoice.1 21606000405
#  8 2017-01-20 13:28:36 Invoice.1 21610000133
#  9 2017-01-20 13:28:36 Invoice.1 21604000592
# 10 2017-01-20 13:28:36 Invoice.1 21609001012
# 11 2017-01-16 19:51:33 Invoice.2 21605000183
# 12 2017-01-16 19:51:33 Invoice.2 21605000183
# 13 2017-01-16 19:51:33 Invoice.2 21605000183
# 14 2017-01-16 19:51:33 Invoice.2 21605000183
# 15 2017-01-20 13:28:36 Invoice.2 21602000125
# 16 2017-01-20 13:28:36 Invoice.2 21608000354
# 17 2017-01-20 13:28:36 Invoice.2 21605000604
# 18 2017-01-20 13:28:36 Invoice.3 21608000366
# 19 2017-01-20 13:28:36 Invoice.3 21608000356
# 20 2017-01-20 13:28:36 Invoice.3 21605000608
Peter
  • 7,460
  • 2
  • 47
  • 68