my question applies to hierarchical data that appear in the format of multiple matrices (one matrix for each level of the hierarchy) and I would like to compare the respective (mirroring) elements of the upper triangular to the lower triangular for each of the matrices.
If you didn't undestand the above paragraph, don't worry, here is a visual demonstration of the problem.
The data originally look like this:
# Wide format - Matrix easily visible
Continent <- c('A__', 'A__', 'A__', 'B==', 'B==')
Country_Origin <- c(1, 2, 3, 1, 2)
Country_Destin_1 <- c(100, 100, 100, 10, 20)
Country_Destin_2 <- c(200, 200, 200, 20, 10)
Country_Destin_3 <- c(300, 300, 300, NA, NA)
df_wide <- data.frame(Continent, Country_Origin, Country_Destin_1,
Country_Destin_2, Country_Destin_3)
df_wide
> df_wide
Continent Country_Origin Country_Destin_1 Country_Destin_2 Country_Destin_3
1 A__ 1 100 200 300
2 A__ 2 100 200 300
3 A__ 3 100 200 300
4 B== 1 10 20 NA
5 B== 2 20 10 NA
I reshape the above (wide) format into long format.
# Long format
Continent <- c('A__', 'A__', 'A__', 'A__', 'A__',
'A__', 'A__', 'A__', 'A__',
'B==', 'B==', 'B==', 'B==')
Country_Origin <- c(1,1,1, 2,2,2, 3,3,3, 1,1, 2,2)
Country_Destin <- c(1,2,3, 1,2,3, 1,2,3, 1,2, 1,2)
Population_From_To <- c(100, 200, 300, 100, 200, 300, 100, 200, 300,
10, 20, 20, 10)
df <- data.frame(Continent, Country_Origin, Country_Destin, Population_From_To)
df$From_To <- with(df, interaction(Country_Origin, Country_Destin))
df
> df
Continent Country_Origin Country_Destin Population_From_To From_To
1 A__ 1 1 100 1.1
2 A__ 1 2 200 1.2
3 A__ 1 3 300 1.3
4 A__ 2 1 100 2.1
5 A__ 2 2 200 2.2
6 A__ 2 3 300 2.3
7 A__ 3 1 100 3.1
8 A__ 3 2 200 3.2
9 A__ 3 3 300 3.3
10 B== 1 1 10 1.1
11 B== 1 2 20 1.2
12 B== 2 1 20 2.1
13 B== 2 2 10 2.2
I would like to compare the Population of the 'From_To' with the reversed ('From_To') 'To_From' variable. For example, the population from country 1 to country 3 (From_To = 1.3) and the population that followed the opposite direction (To_From = 3.1) of countries of the same continent.
I assume that initially I will need to make a reversed 'From_To' (namely, a 'To_From') variable per Continent (perhaps manually, if I treat 'From_To' as string). Then I will need to pass the information of the Population of the 'From_To' to the respective cell of 'To_From' within each continent.
Basically I would be comparing an element in the upper triangular [1,3] with the mirrored in the lower triangular [3,1] if the data had still the matrix format.
I thought that having the data into a long format and the respective elements in two different variables ('Population_From_To' and 'Population_To_From') I would be able to compare them easier.
# For example a manual import of the variables that
# I wish I knew how to calculate in R:
# | variables needed |
Continent Country_Origin Country_Destin From_To Popul_From_To To_From Popul_To_From
1 A__ 1 1 1.1 100 1.1 100
2 A__ 1 2 1.2 200 2.1 100
3 A__ 1 3 1.3 300 3.1 100
4 A__ 2 1 2.1 100 1.2 200
5 A__ 2 2 2.2 200 2.2 200
6 A__ 2 3 2.3 300 3.2 200
7 A__ 3 1 3.1 100 1.3 300
8 A__ 3 2 3.2 200 2.3 300
9 A__ 3 3 3.3 300 3.3 300
10 B== 1 1 1.1 10 1.1 10
11 B== 1 2 1.2 20 2.1 20
12 B== 2 1 2.1 20 1.2 20
13 B== 2 2 2.2 10 2.2 10
# | for R to generate |
I understand that I need to:
a) use lapply (since I work on the Continent level)
b) feed into 'To_From' the opposite structure compared to 'From_To'
c) feed the Population number from 'To_From' in the matching 'From_To' within each Continent.
Any ideas about how you would define this type of data would be also helpful as I would be able to search more specifically on the topic. I have started trying to solve the problem in Python, but I would prefer if I had a solution in R.
Thank you in advance :-)