0

I want to transform a data.frame to an adjacency matrix. In my data, I got articles and authors (several columns, one for each coauthor), where each row is an article. I want authors of the same article to be tied.

The data structure is now like this:

data <- data.frame(Author1 = c("Alan", "Rebecca", "Micheal", "Dany", "Euron", "Alan"),
                   Author2 = c("Rebecca", NA, "Alan", "Euron", NA, "Dany"),
                   Author3 = c("Dany", NA, "Euron", "Micheal", NA, NA),
                   Author4 = c("Euron", NA, "Rebecca", NA, NA, NA),
                   Title = c("Eric's boat", "Top 100 boats", "Boats in the World", "Death and boats", "Boats and Dragons", "Boats"))

Which gives this output:

+---------+---------+---------+---------+--------------------+
| Author 1| Author2 | Author3 | Author4 | Title              |
+---------+---------+---------+---------+--------------------+
| Alan    | Rebecca | Dany    | Euron   | Eric's boat        |
| Rebecca | NA      | NA      | NA      | Top 100 boats      |
| Micheal | Alan    | Euron   | Rebecca | Boats in the world |
| Dany    | Euron   | Micheal | NA      | Death and boats    |
| Euron   | NA      | NA      | NA      | Boats and Dragons  |
| Alan    | Dany    | NA      | NA      | Boats              |
+---------+---------+---------+---------+--------------------+

I want it to look like this:

+---------+------+---------+---------+------+-------+
|         | Alan | Rebecca | Micheal | Dany | Euron |
+---------+------+---------+---------+------+-------+
| Alan    |    0 |       1 |       0 |    1 |     1 |
| Rebecca |    1 |       0 |       1 |    0 |     0 |
| Micheal |    1 |       1 |       1 |    1 |     1 |
| Dany    |    1 |       0 |       1 |    0 |     1 |
| Euron   |    1 |       0 |       1 |    1 |     0 |
+---------+------+---------+---------+------+-------+
ecl
  • 369
  • 1
  • 15
  • Hi E. CL, it would be very useful if you share your data set in a reproducible way. Please read this suggestions: https://stackoverflow.com/help/reprex and here: https://stackoverflow.com/questions/49994249/example-of-using-dput – Scipione Sarlo May 17 '19 at 13:54
  • 1
    Thanks, is it better now? – ecl May 17 '19 at 14:10
  • Yes, now it's perfect! – Scipione Sarlo May 17 '19 at 14:13
  • Is the expected output correct based on the input – akrun May 17 '19 at 14:15
  • I think your question is a duplicate of this one: https://stackoverflow.com/questions/41214012/r-creating-an-adjacency-matrix-from-columns-in-a-dataframe – Scipione Sarlo May 17 '19 at 14:16
  • Possible duplicate of [r creating an adjacency matrix from columns in a dataframe](https://stackoverflow.com/questions/41214012/r-creating-an-adjacency-matrix-from-columns-in-a-dataframe) – Scipione Sarlo May 17 '19 at 14:21
  • 1
    In addition to the post @ScipioneSarlo linked, there's https://stackoverflow.com/q/17525231/5325862 and several others. Maybe you can show what you've tried so far and explain how this problem differs from previous posts on creating adjacency matrices. – camille May 17 '19 at 14:39
  • The outcomes that are requested in the two unique questions you shared with me is not the same as what I want. I need an adjacency matrix where authors (rows) who wrote the article/text/book are indicated. The suggestion of duplicated that you made result in a matrix where the row and columns indicate different things. I want where they indicate the same thing (row 5 and column 5 show the same information. I'm sorry for eventually beginner mistakes here at StackOverflow, it's my first time posting. I'm trying to learn when you correct me. – ecl May 17 '19 at 15:26
  • Yeah I think it's more complicated than those posts. That's why I'm interested to see if there's any way you've cobbled together pieces of those approaches or what your thought process is so far. The way I might approach it is pretty circuitous, but there might be context I'm missing or an approach that makes more sense for you – camille May 17 '19 at 16:23
  • What I've been trying to do is to create a data.frame where all the articles of an author are on the same row (so one author only occur once in a row, with all his/her titles in the following columns). My idea was to try to create an edge-list and then work from there. But I got stuck on how to make it to an edge list. – ecl May 17 '19 at 20:13
  • Your output doesn't seem right, though: every author appears with every other author at least once. So if you're trying to make an adjacency matrix, it should be all ones. For example, you have a 0 for Dany --> Rebecca and Rebecca --> Dany, but they coauthor "Eric's Boat". Check that the data sample is actually representative of the problem. – camille May 18 '19 at 14:36

0 Answers0