-1

I had a specific question about turning my data into two columns so I can make an edgelist. I've attached a screenshot of the data. There's up to V10, and each row represents artists that have worked on the same song. I wanted to create an edgelist with the artist names. For example, for rows that have person A, B, C, D, I wanted to create:

A B

A C

A D

B C

B D

C D

The code I used so far is:

reltest <- t(do.call(cbind, lapply(cleanartists[sapply(cleanartists, length) >= 2], combn, 2)))

But this gives me all possible combinations among the artist names, not just the ones that have existing relationships. This is what my data looks like:

 > head(cleanartists, n = 20)
                        V1                        V2              V3              V4   V5   V6   V7   V8   V9  V10
1             Bethel Music              Jenn Johnson            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2                Gal Costa            Caetano Veloso            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3                     JAYZ                Kanye West            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4                     2Pac                 Danny Boy            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5                 Ludacris                   Shawnna            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6         Richard Armitage            The Dwarf Cast            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7                 Ludacris                     TPain            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8   The Velvet Underground                  Lou Reed            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9     The Stanley Brothers  The Clinch Mountain Boys            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10      The Carter Sisters           Mother Maybelle            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
11               Lady Gaga              Colby ODonis            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
12                 Rihanna                      JAYZ            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
13              Lil Yachty              Trippie Redd            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
14              Brian Tuey            James McCawley  Kevin Sherwood  Treyarch Sound <NA> <NA> <NA> <NA> <NA> <NA>
15   Sister Rosetta Tharpe              The Rosettes            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
16             Bing Crosby       The Andrews Sisters            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
17            Stone Poneys            Linda Ronstadt            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
18                  J Cole                     Drake            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
19 The Last Shadow Puppets               Alex Turner      Miles Kane            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
20               Gal Costa            Caetano Veloso            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
hlee28
  • 3
  • 1
  • Maybe something like this then just drop the NA values https://stackoverflow.com/questions/35690478/r-combine-multiple-columns-as-pairs-of-column-cells-in-same-row – MrFlick Jun 11 '20 at 19:31
  • Related: https://stackoverflow.com/questions/44894773/reshaping-k-columns-to-2-columns-representing-sequential-pairs-of-the-values-of – MrFlick Jun 11 '20 at 19:33
  • Related: https://stackoverflow.com/questions/45742468/create-edgelist-for-all-interactions-from-data-frame – MrFlick Jun 11 '20 at 19:35
  • Very similar: https://stackoverflow.com/questions/13782132/create-edge-list-from-ragged-data-frame-in-r-for-network-analysis – MrFlick Jun 11 '20 at 19:37

2 Answers2

0

You can use apply to apply your function to every row, and then only take the elements that are not NA. And with the approach from here you can get rid of duplicates.

test_data <- data.frame(V1 = c("A", "D"),
                        V2 = c("B", "B"),
                        V3 = c("C", NA),
                        V4 = c("D", NA),
                        stringsAsFactors = FALSE)

combinations <- t(do.call("cbind", apply(test_data, 1, function(x) combn(x[!is.na(x)], 2))))

library(dplyr)
combinations_cleaned <- data.frame(combinations, stringsAsFactors = FALSE) %>%
  mutate(key = paste0(pmin(X1, X2), pmax(X1, X2), sep = "")) %>%
  distinct(key, .keep_all = TRUE) %>% 
  select(-key)

combinations_cleaned
  key
1  AB
2  AC
3  AD
4  BC
5  BD
6  CD

starja
  • 9,887
  • 1
  • 13
  • 28
0

Keeping with rbase functions, but adding magrittr (%>%) to make the code more readable, try this:

# add the pipe (%>%) operator
library(magrittr)

# tibble just to make an dataset easily
dtf <- tibble::tribble(
  ~V1, ~V2, ~V3, ~V4, ~V5,
  "A", "B", NA, NA, NA,
  "A", "B", "C", NA, NA,
  "D", "E", "F", NA, NA,
  "F", "G", NA, NA, NA
) %>% as_data_frame()


dtf %>% 
  apply(., 1, function(.x){   # for each row in the dataset
    .x[!is.na(.x)] %>%        # as char vector, remove the NA values
      combn(2) %>%            # make combinations of 2 of the elements 
      t() %>%                 # transpose the matrix output of combn
      as.data.frame()         # transform the matrix in a data frame
  }) %>% 
  do.call(rbind, .)           # bind the data dataframes

You'll get:

  V1 V2
1  A  B
2  A  B
3  A  C
4  B  C
5  D  E
6  D  F
7  E  F
8  F  G

same as the code:

# without '%>%' operator
do.call(rbind,apply(dtf, 1, function(.x){as.data.frame(t(combn(.x[!is.na(.x)],2)))}))