0

I have the data below. How can I determine which author has the highest number of publications?

I try this

   (which(status$researchers==max(status$publications)) 

but it doesn't seem to work.

#PUBLICATIONS

researchers = c("Smith", "Johnson", "Williams", "Brown", "Jones", "Miller", "Davis", "García", "Rodriguez", "Wilson", "Martinez", "Anderson", "Taylor", "Thomas", "Hernandez", "Moore", "Martin", "Jackson", "Thompson", "White", "Lopez", "Lee", "Gonzalez", "Harris", "Clark", "Lewis", "Robinson", "Walker", "Perez", "Hall", "Young", "Allen", "Sanchez", "Wright", "King", "Scott", "Green", "Baker", "Adams", "Nelson", "Hill", "Ramirez", "Campbell", "Mitchell", "Roberts", "Carter", "Phillips", "Evans", "Turner", "Stapel", "Torres", "Parker", "Collins", "Edwards", "Stewart", "Flores", "Morris", "Nguyen", "Murphy", "Rivera", "Cook", "Rogers", "Morgan", "Peterson", "Cooper", "Reed", "Bailey", "Bell", "Gomez", "Kelly", "Howard", "Ward", "Cox", "Diaz", "Richardson", "Wood", "Watson", "Brooks", "Bennett", "Gray", "James", "Reyes", "Cruz", "Hughes", "Price", "Myers", "Long", "Foster ", "Sanders", "Ross", "Morales", "Powell", "Sullivan", "Russell", "Ortiz", "Jenkins", "Gutierrez", "Perry", "Butler", "Barnes", "Fisher", "De Jong", "Jansen", "De Vries", "vd Berg", "Van Dijk", "Bakker", "Janssen", "Visser", "Smit", "Meijer", "De Boer", "Mulder", "De Groot", "Bos", "Smeesters", "Vos", "Peters", "Hendriks", "Van Leeuwen", "Dekker", "Brouwer", "De Wit", "Dijkstra", "Smits", "De Graaf", "Van der Meer", "Muller", "Schmidt", "Schneider", "Fischer", "Meyer", "Weber", "Schulz", "Wagner", "Becker", "Hoffmann", "Wagemakers",  "Molenaar", "Jansen", "White", "Bargh", "Dijksterhuis", "Poldermans", "Kanazawa", "Lynne", "Ling", "Vorst", "Borsboom", "Wicherts")

articles = data.frame(cbind(researchers, publications))
write.table(articles, file = "scientific status.txt", sep = " ")

status = read.table("scientific status.txt", header = TRUE, sep = "", quote = "\"'")     
10 Rep
  • 2,217
  • 7
  • 19
  • 33
mats
  • 133
  • 1
  • 3
  • 10
  • I don't think how you create the data, even less the `{write,read}.table` steps are relevant here. It would be a lot more useful if you gave a sample of your data, please refer to http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – flodel Dec 30 '12 at 12:21
  • Well, I thought it would be useful to create be able to create the data. – mats Dec 30 '12 at 12:45
  • But what are the contents of `status` ? Unless they are integers, you're unlikely to get any matches. Your `researchers` vector has no numbers so `max` is going to do interesting things with those character strings. – Carl Witthoft Dec 30 '12 at 14:39
  • How are you defining "outlier"? – A5C1D2H2I1M1N2O1R2T1 Dec 30 '12 at 15:29

2 Answers2

2

It is not a general response but here you need just to extract duplicated.

researchers[duplicated(researchers)]
[1] "Jansen" "White"  ## this 2 authors have 1 publications more than others!

To see the ouliers you can do this for example :

plot(table(researchers))

enter image description here

agstudy
  • 119,832
  • 17
  • 199
  • 261
2

It is not clear what your data represents. If it is already aggregated per author, i.e., there is one row per author and the publications column contains the number of publications, do:

status$researchers[which.max(status$publications)]

If instead, your data is not aggregated, i.e., there is one per article, you can do:

tail(sort(table(status$researchers)), 1)
flodel
  • 87,577
  • 21
  • 185
  • 223
  • Thanks. This helps. And what about the situation where I want to know the name of the researcher who published, say, 30, articles? – mats Dec 30 '12 at 12:19
  • If your data is already aggregated, `subset(status, publications >= 30)`. If it is not aggregated, `which(table(researchers) >= 30`. – flodel Dec 30 '12 at 12:23