I want to see if I can visualise who is publishing with whom in peer-reviewed journals for a certain subject. To do this I have typed the keyword "Barrett's" into pubmed and downloaded a large file which gives me two columns, Title
and Author
structure(list(Title = structure(c(1L, 4L, 3L, 2L, 5L), .Label = c("A case of Barrett's adenocarcinoma with marked endoscopic morphological changes in Barrett's esophagus over a long follow-up period of 15\xe4\xf3\x8ayears.",
"APE1-mediated DNA damage repair provides survival advantage for esophageal adenocarcinoma cells in response to acidic bile salts.",
"Healthcare Cost of Over-Diagnosis of Low-Grade Dysplasia in Barrett's Esophagus.",
"Radiofrequency ablation coupled with Roux-en-Y gastric bypass: a treatment option for morbidly obese patients with Barrett's esophagus.",
"Risk factors for Barrett's esophagus."), class = "factor"),
Author = structure(c(3L, 5L, 4L, 2L, 1L), .Label = c("Arora Z, Garber A, Thota PN.",
"Hong J, Chen Z, Peng D, Zaika A, Revetta F, Washington MK, Belkhiri A, El-Rifai W.",
"Iwaya Y, Yamazaki T, Watanabe T, Seki A, Ochi Y, Hara E, Arakura N, Tanaka E, Hasebe O.",
"Lash RH, Deas TM Jr, Wians FH Jr.", "Parikh K, Khaitan L."
), class = "factor")), .Names = c("Title", "Author"), row.names = c(NA,
5L), class = "data.frame")
I want to count how many times one author has published with another author. I thought the best way to do this would be to create a co-occurrency matrix (later I'll be using igraph).
I am having some problem understanding how to convert my data into such a matrix. I guess it would involve listing all the authors as column names and also as row names and then iterating through each row of the Auth dataframe and recording the co-occurrence of two names in the matrix. Is there a quick way to do this. I am lost in how to approach this. So I tried this:
1.Extract all the names into a long list from the Author column
2.Then create colnames from the Author list
3.Then create rownames from the Author list
4.Then somehow iterate through Auth[2] and count the name co-occurrence
...but I get stuck at the first extraction which I tried with:
AuthSplit<-strsplit(Auth$Author, ",", fixed=T)
AuthSplit<-as.data.frame(AuthSplit)
but I get this error:
Error in data.frame(c("Iwaya Y", " Yamazaki T", " Watanabe T", " Seki A", :
arguments imply differing number of rows: 9, 2, 3, 8, 20, 5, 1, 11, 4, 23, 6, 15, 16, 7, 12, 10, 14, 21, 13, 18, 19, 17, 22
There must be an easier way?