2

I have a dataframe as below, with 47 States and the average market fare of traveling by airplane between those two states (the order of State 1 and State 2 does not matter). How can I convert this to a 47x47 matrix, where each row and column is a State Name, and the value at each location is the Mean Market Fare between those two states.

First 6 Rows:

  State 1     State 2 Mean Market Fare
1 Alabama     Alabama         263.3752
2 Alabama     Arizona         320.5036
3 Alabama    Arkansas         288.9775
4 Alabama  California         352.6983
5 Alabama    Colorado         282.6864
6 Alabama Connecticut         266.9601

Last 6 Rows:

           State 1   State 2 Mean Market Fare
1097    Washington   Wyoming         286.9314
1098 West Virginia Wisconsin         302.7769
1099 West Virginia   Wyoming         493.2000
1100     Wisconsin Wisconsin         251.3333
1101     Wisconsin   Wyoming         285.3015
1102       Wyoming   Wyoming         275.9800
Jim
  • 23
  • 5

4 Answers4

2

I think you can try the code like below using xtabs

xtabs(Mean_Market_Fare~.,df)

such that

> xtabs(Mean_Market_Fare~.,df)
               State_2
State_1          Alabama  Arizona Arkansas California Colorado Connecticut Wisconsin  Wyoming
  Alabama       263.3752 320.5036 288.9775   352.6983 282.6864    266.9601    0.0000   0.0000
  Washington      0.0000   0.0000   0.0000     0.0000   0.0000      0.0000    0.0000 286.9314
  West Virginia   0.0000   0.0000   0.0000     0.0000   0.0000      0.0000  302.7769 493.2000
  Wisconsin       0.0000   0.0000   0.0000     0.0000   0.0000      0.0000  251.3333 285.3015
  Wyoming         0.0000   0.0000   0.0000     0.0000   0.0000      0.0000    0.0000 275.9800

DATA

df <- structure(list(State_1 = c("Alabama", "Alabama", "Alabama", "Alabama", 
"Alabama", "Alabama", "Washington", "West Virginia", "West Virginia", 
"Wisconsin", "Wisconsin", "Wyoming"), State_2 = c("Alabama", 
"Arizona", "Arkansas", "California", "Colorado", "Connecticut", 
"Wyoming", "Wisconsin", "Wyoming", "Wisconsin", "Wyoming", "Wyoming"
), Mean_Market_Fare = c(263.3752, 320.5036, 288.9775, 352.6983, 
282.6864, 266.9601, 286.9314, 302.7769, 493.2, 251.3333, 285.3015, 
275.98)), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12"))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • This is perfect. Thank you. All I had to do additionally was convert the xtabs to Matrix form, and then add in `m=forceSymmetric(m,uplo="U") ` to make it a symmetric matrix instead of upper triangular. But thank you so much this was super helpful. – Jim Mar 24 '20 at 00:20
1

You can use pivot_wider from tidyr to reshape your dataframe into a wider format.

Here, using the first lines of your example in a dataframe called "df":

df
    State1      State2 Mean_Market_Fare
1: Alabama     Alabama         263.3752
2: Alabama     Arizona         320.5036
3: Alabama    Arkansas         288.9775
4: Alabama  California         352.6983
5: Alabama    Colorado         282.6864
6: Alabama Connecticut         266.9601

You can do:

library(tidyr)
library(dplyr)
df %>% pivot_wider(names_from = State2, values_from = Mean_Market_Fare)

   State1  Alabama  Arizona Arkansas California Colorado Connecticut
1 Alabama 263.3752 320.5036 288.9775   352.6983 282.6864    266.9601

Does it answer your question ?


Reproducible example

structure(list(State1 = c("Alabama", "Alabama", "Alabama", "Alabama", 
"Alabama", "Alabama"), State2 = c("Alabama", "Arizona", "Arkansas", 
"California", "Colorado", "Connecticut"), Mean_Market_Fare = c(263.3752, 
320.5036, 288.9775, 352.6983, 282.6864, 266.9601)), row.names = c(NA, 
-6L), class = c("data.table", "data.frame"))
dc37
  • 15,840
  • 4
  • 15
  • 32
1

Using data.table's dcast() function, spreading it on the state 2 variable:

dcast(dtName, state1 ~ state2, value.vars = meanMarketFare)

A toy example...

library(data.table)
DT1 <- data.table(
  "V1" = c("a", "a", "b"),
  "V2" = c("b", "c", "c"),
  "V3" = c(2,6,9))

dcast(DT1, V1 ~ V2, value.vars = V3)

Gives

   V1  b c
1:  a  2 6
2:  b NA 9

Note you could also shorten it to

dcast(DT1, ... ~ state2)
rg255
  • 4,119
  • 3
  • 22
  • 40
0
df <- data.frame(state1=c(rep("a", 3), rep("b", 3), rep("c", 3)),
                 state2=rep(c("a", "b", "c"), 3),
                 dist=c(1, 3, 2, 4, 3, 2, 4, 1, 3))

pairwise_df2matrix <- function(df, value_col) {
  df <- df[order(df[, 1], df[, 2], decreasing=FALSE), ]
  dfs <- split(df, df[, 1])
  m <- Reduce(rbind, lapply(dfs, function(df) df[, value_col]))
  colnames(m) <- names(dfs)
  rownames(m) <- names(dfs)
  m
}

pairwise_df2matrix(df, "dist")
Gwang-Jin Kim
  • 9,303
  • 17
  • 30