4

I have 2 lists with names of cities. both the lists contain exactly the same city names. I have to create city pairs out of these lists. So for example, the 1st list contains names like London, Paris, Moscow, Geneva and Tokyo. List 2 contains the exact same names. However the third list should contains pairs like "London-Paris", "London-Tokyo","London-Geneva" and so on but not "Tokyo-London" or " Paris-London" since that would be double counting. Any help either in R or excel would be appreciated.

I have tried using 'combn' function in R. However I have around 4500 observation and 'combn' function did not work for me.

N.V
  • 43
  • 4
  • 1
    Can you have pairs of the same city? e.g. "London-London" – Tom Haddow Feb 05 '19 at 14:33
  • 1
    Not sure why @utubun deleted his answer, but `combn(myList, 2, paste, collapse = "-")` works great. – acylam Feb 05 '19 at 14:37
  • @avid_useR I was thinking about 4500 cities, and how much time it takes for `combn` to do the job. Undeleted it. – utubun Feb 05 '19 at 14:41
  • 2
    Also, not sure why this question has 3 close votes of "Too Broad". While it might be trivial to some, the question has a clear goal, expected input and output. The only thing I would add is code that OP has tried. If it's a duplicate, mark it as one. "Too Broad" is not a good reason IMHO. – acylam Feb 05 '19 at 14:46
  • 1
    Check this question for faster methods than `combn`: https://stackoverflow.com/questions/26828301/faster-version-of-combn – acylam Feb 05 '19 at 14:56

5 Answers5

3

(You don't really need second list to do that, one is enough)

cities  <- list("London", "Paris", "Kyiv", "Geneva", "Tokyo")

combn(cities, 2, paste, collapse = "-")

# [1] "London-Paris"  "London-Kyiv"   "London-Geneva" "London-Tokyo"  "Paris-Kyiv"   
# [6] "Paris-Geneva"  "Paris-Tokyo"   "Kyiv-Geneva"   "Kyiv-Tokyo"    "Geneva-Tokyo" 
utubun
  • 4,400
  • 1
  • 14
  • 17
1

Using expand.grid and then manipulating:

# create all possible combinations
df <- expand.grid(myList, myList)

# ensure only 1 combination for each pair
df <- as.data.frame(unique(t(apply(df, 1, sort))))

# remove same city combinations
df <- subset(df, df$V1 != df$V2)

# create column with pairs
df$combo <- paste0(df$V1, "-", df$V2)
Tom Haddow
  • 230
  • 1
  • 10
0

For Excel - VBA:

We can use a tiny trick:

Since the the lists are identical, we can solve the problem with one list only:

Sub MakePairs()
    Dim i As Long, N As Long, k As Long, j As Long

    N = Cells(Rows.Count, "A").End(xlUp).Row
    k = 1

    For i = 1 To N - 1
        For j = i + 1 To N
            Cells(k, 3).Value = Cells(i, 1).Value & "-" & Cells(j, 1).Value
            k = k + 1
        Next j
    Next i
End Sub

enter image description here

NOTE:

  1. this technique avoids unwanted permutations of existing pairs
  2. this technique avoids replicate pairs like London-London
Gary's Student
  • 95,722
  • 10
  • 59
  • 99
0

You can also do it in excel. Create a new module, paste the following code, update the range in the first 2 lines of code and execute the VBA macro:

Sub combn_VBA()
    'Define variables
    Dim CityListSourceRange As Range: Set CityListSourceRange = Sheet1.Range("A1:A5") '<-- Replace A5:A9 with range address containing the city names
    Dim CityCombinationDestinyRange As Range: Set CityCombinationDestinyRange = Sheet1.Range("C1") '<-- Replace C1 with first cell where you want to place the result list
    Set CityList = CreateObject("Scripting.Dictionary")

    'Copies the source cities into a collection
    For Each CellX In CityListSourceRange
        i = i + 1
        CityList.Add CellX.Value, i
    Next CellX

    'Creates unique pairs
    For Each City1 In CityList
        For Each City2 In CityList
            If CityList(City1) < CityList(City2) Then
                CityCombinationDestinyRange.Offset(j, 0).Value = City1 & "-" & City2
                j = j + 1
            End If
        Next City2
    Next City1
End Sub

To obtain something like this:

result

Mike
  • 144
  • 10
0

We can try to cook our own function to find the combinations of the city names It works slightly faster, comparing to combn():

function

combn2 <- function(x){
  n = length(x)
  paste(
    x[rep.int(seq_along(x)[-n], times = rev(seq_along(x))[-1])], 
    x[unlist(lapply(seq_along(x)[-1], ':', to = n))], 
    sep = '-'
  )
}

check if results are correct

cities  <- list("London", "Paris", "Kyiv", "Geneva", "Tokyo")

combn2(cities)

# [1] "London-Paris"  "London-Kyiv"   "London-Geneva" "London-Tokyo"  "Paris-Kyiv"   
# [6] "Paris-Geneva"  "Paris-Tokyo"   "Kyiv-Geneva"   "Kyiv-Tokyo"    "Geneva-Tokyo" 

compare timing for combn() and combn2() for 5K cities

data
cities <- unique(maps::world.cities$name)

length(cities)

# [1] 41074

cities <- cities[1:5000]
timing for combn()
system.time(
  combn(cities, 2, paste, collapse = "-")
)

#   user  system elapsed 
# 116.02    0.01  116.33 
timing for combn2()
system.time(
  combn2(cities)
)

#  user  system elapsed 
# 14.04    0.00   14.09 

I think the most of the time is consumed by paste(), so if you find the way to nuclearize paste(), I would be very grateful if you let me know how you did it.

utubun
  • 4,400
  • 1
  • 14
  • 17