1

I have got a matrix, that I generated using tapply.
It looks like this:

      NON-ROAD NONPOINT   ON-ROAD     POINT
1999 522.94000 2107.625 346.82000  296.7950
2002 240.84692 1509.500 134.30882  569.2600
2005 248.93369 1509.500 130.43038 1202.4900
2008  55.82356 1373.207  88.27546  344.9752

as you can see that I don't have any id variable that I can use to melt all the 4 columns into a single column.

The dataset is around 30MB in size : data

to generate the matrix:

NEI <- readRDS("summarySCC_PM25.rds")
data <-  with(NEI[NEI$fips=="24510",],tapply(Emissions,list(year,type),sum))

> class(data)
[1] "matrix"

Expected Output:

   Year   Type     Emission
1  1999 NON-ROAD  522.94000
2  2002 NON-ROAD  240.84692
3  2005 NON-ROAD  248.93369
4  2008 NON-ROAD   55.82356
5  1999 NONPOINT 2107.62500
6  2002 NONPOINT 1509.50000
7  2005 NONPOINT 1509.50000
8  2008 NONPOINT 1373.20731
9  1999  ON-ROAD  346.82000
10 2002  ON-ROAD  134.30882
11 2005  ON-ROAD  130.43038
12 2008  ON-ROAD   88.27546
13 1999    POINT  296.79500
14 2002    POINT  569.26000
15 2005    POINT 1202.49000
16 2008    POINT  344.97518

Also, I have seen solutions that can directly convert the original dataset into my expected output using aggregate. But i can't use that function to maintain a uniformity with my other answers, in my assignment.

It would also be great if the solution is done by native R functions, if possible.

Rishabh Kumar
  • 2,342
  • 3
  • 13
  • 23

2 Answers2

1

One way would be to convert your data to dataframe, add a year column and reshape your data.

result <- reshape(data.frame(data, year = rownames(data), check.names = FALSE), 
                  direction = "long", varying = list(colnames(data)), 
                  v.names = "Emission", times = colnames(data), 
                  idvar = 'year', timevar = 'Type')
rownames(result) <- NULL
result

Using tidyverse function we can do :

library(tibble)
library(tidyr)

data %>%
  as.data.frame() %>%
  rownames_to_column('year') %>%
  pivot_longer(cols = -year, names_to = 'Type', values_to = 'Emission')

#   year  Type     Emission
#   <chr> <chr>       <dbl>
# 1 1999  NON.ROAD    523. 
# 2 1999  NONPOINT   2108. 
# 3 1999  ON.ROAD     347. 
# 4 1999  POINT       297. 
# 5 2002  NON.ROAD    241. 
# 6 2002  NONPOINT   1510. 
# 7 2002  ON.ROAD     134. 
# 8 2002  POINT       569. 
# 9 2005  NON.ROAD    249. 
#10 2005  NONPOINT   1510. 
#11 2005  ON.ROAD     130. 
#12 2005  POINT      1202. 
#13 2008  NON.ROAD     55.8
#14 2008  NONPOINT   1373. 
#15 2008  ON.ROAD      88.3
#16 2008  POINT       345. 

data

data <- structure(c(522.94, 240.84692, 248.93369, 55.82356, 2107.625, 
1509.5, 1509.5, 1373.207, 346.82, 134.30882, 130.43038, 88.27546, 
296.795, 569.26, 1202.49, 344.9752), .Dim = c(4L, 4L), .Dimnames = list(
    c("1999", "2002", "2005", "2008"), c("NON.ROAD", "NONPOINT", "ON.ROAD", "POINT")))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I get `Error in [.data.frame (data, , varying.i) : undefined columns selected` on executing the first way. – Rishabh Kumar Feb 21 '21 at 11:44
  • Do you have other columns in your data which are not shown? I updated my answer with the data that I am using for the answer which works without any error on my end. Can you try the answer with that data and see if it works for you. – Ronak Shah Feb 21 '21 at 12:00
  • please see this link : https://pastebin.com/RVsUuAPp . Its unmodified form of concurrent command execution on my R console. When I use your data, it works then. I checked, both of our data are `matrix`, both of them look same. Still they work differently. If I am missing something, that your sharp eyes could catch, please tell me. – Rishabh Kumar Feb 21 '21 at 12:24
  • @RishabhKumar Can you provide `dput(data)` so that it is easier to verify the data. – Ronak Shah Feb 21 '21 at 14:12
  • `structure(c(522.94, 240.846922272992, 248.9336897056, 55.8235614579, 2107.625, 1509.5, 1509.5, 1373.2073079721, 346.82, 134.308821199349, 130.430379885892, 88.275457392443, 296.795, 569.26, 1202.49, 344.975183943), .Dim = c(4L, 4L), .Dimnames = list(c("1999", "2002", "2005", "2008"), c("NON-ROAD", "NONPOINT", "ON-ROAD", "POINT")))` – Rishabh Kumar Feb 21 '21 at 14:47
  • 1
    @RishabhKumar Thank you for sharing `dput`, the error comes up because you have non-standard column names with `"-"` in them which R changes to `"."`. I have updated the answer to include `check.names = FALSE` which will keep the column names as they are. You can try the updated answer. – Ronak Shah Feb 21 '21 at 23:51
  • what does `check.names` does, can you provide a link where I can learn about it? – Rishabh Kumar Feb 22 '21 at 05:38
  • R does not allow column names which have special symbols in it or that start with number. So to make them valid column names it replaces it with dot (unless you specify `check.names = FALSE`). Check `check.names` description in `?read.table`. – Ronak Shah Feb 22 '21 at 06:04
1

Exploit the fact that in R matrices are actually represented as vectors with dimensions attributes. Using as.vector strips off the dimensions and lines up the column data. All you have to do then is to bind together a data.frame with the dimnames.

res <- data.frame(year=rownames(m),
                  type=rep(colnames(m), each=nrow(m)), 
                  emission=as.vector(m))
res
#    year     type   emission
# 1  1999 NON.ROAD  522.94000
# 2  2002 NON.ROAD  240.84692
# 3  2005 NON.ROAD  248.93369
# 4  2008 NON.ROAD   55.82356
# 5  1999 NONPOINT 2107.62500
# 6  2002 NONPOINT 1509.50000
# 7  2005 NONPOINT 1509.50000
# 8  2008 NONPOINT 1373.20700
# 9  1999  ON.ROAD  346.82000
# 10 2002  ON.ROAD  134.30882
# 11 2005  ON.ROAD  130.43038
# 12 2008  ON.ROAD   88.27546
# 13 1999    POINT  296.79500
# 14 2002    POINT  569.26000
# 15 2005    POINT 1202.49000
# 16 2008    POINT  344.97520

Data:

m <- structure(c(522.94, 240.84692, 248.93369, 55.82356, 2107.625, 
1509.5, 1509.5, 1373.207, 346.82, 134.30882, 130.43038, 88.27546, 
296.795, 569.26, 1202.49, 344.9752), .Dim = c(4L, 4L), .Dimnames = list(
    c("1999", "2002", "2005", "2008"), c("NON.ROAD", "NONPOINT", 
    "ON.ROAD", "POINT")))
jay.sf
  • 60,139
  • 8
  • 53
  • 110