Many dataframes, different row lengths, similiar columns and dataframe titles, how to bind?

Question

This takes a bit to explain and the post itself may be a bit too long to be answered.

I have MANY data frames of individual chess players and their specific ratings at points in time.

Here is what my data looks like. Please forgive me for my poor formatting of separating the datasets. Carlsen and Nakamura are separate dataframes.

Player1

 Nakamura, Hikaru Year
             2364 2001-01-01
             2430 2002-01-01
             2520 2003-01-01
             2571 2004-01-01
             2613 2005-01-01
             2644 2006-01-01
             2651 2007-01-01
             2670 2008-01-01
             2699 2009-01-01
             2708 2010-01-01
             2751 2011-01-01
             2759 2012-01-01
             2769 2013-01-01
             2789 2014-01-01
             2776 2015-01-01
             2787 2016-01-01

Player2
          Carlsen, Magnus Year

                   2127   2002-01-01
                   2279   2003-01-01
                   2484   2004-01-01
                   2553   2005-01-01
                   2625   2006-01-01
                   2690   2007-01-01
                   2733   2008-01-01
                   2776   2009-01-01
                   2810   2010-01-01
                   2814   2011-01-01
                   2835   2012-01-01
                   2861   2013-01-01
                   2872   2014-01-01
                   2862   2015-01-01
                   2844   2016-01-01

You can download the two sets here:

Download Player2 Download Player1

Between the above code, and below, Ive deleted two columns and reassigned an observation as a column title.

Hikaru Nakamura/Magnus Carlsen's chess rating over time

Hikaru's data is assigned to a dataframe, Player1. Magnus's data is assigned to a dataframe, Player2.

What I want to be able to do is get what you see below, a dataframe of them combined.

The code I used to produce this frame is

 merged<- merge(Player1, Player2, by = c("Year"), all = TRUE)

Now, this is all fun and dandy for two data sets, but I am having very annoying difficulties to add more players to this combined data set.

For example, maybe I would like to add 5, 10, 15 more players to this set. Examples of these players would be Kramnik, Anand, Gelfand ( Examples of famous chess players). As you'd expect, for 5 players, the dataframe would have 6 columns, 10 would have 11, 15 would have 16, all ordered nicely by the Year variable.

Fortunately, the number of observations for each Player is less than 100 always. Also, each individual player is assigned his/her own dataset.

For example,

 Nakamura is the Player1 dataframe
 Carlsen is the Player2 dataframe
 Kramnik is the Player3 dataframe
 Anand is the Player4 dataframe
 Gelfand is the Player5 dataframe

all of which I have created using a for loop assigning process using this code

for (i in 1:nrow(as.data.frame(unique(Timed_set_filtered$Name)))) {
  assign(paste("Player",i,sep=""), subset(Timed_set_filtered, Name == unique(Timed_set_filtered$Name)[i]))
}

I don't want to write out something like below:

 merged<- merge(Player1, Player2,.....Player99 ,Player100, by = c("Year"), all = TRUE)

I want to able to merge all 5, 10, 15...i number of Player"i" objects that I created in the loop together by Year.

Also, once it leaves the loop initially, each dataset looks like this.

So what ends up happening is that I assign all of the data sets to a list by using the following snippet:

 lst <- mget(ls(pattern='^Player\\d+'))
 list2env(lapply(lst,`[`,-2), envir =.GlobalEnv)
 lst <- mget(ls(pattern='^Player\\d+'))

for (i in 1:nrow(as.data.frame(unique(Timed_set_filtered$Name)))) {
  names(lst[[i]]) [names(lst[[i]]) == 'Rating'] <- eval(unique(Timed_set_filtered$Name)[i])
}

This is what my list looks like.

Is there a way I write a table with YEAR as the way its merged by, so that it[cbinds, bind_cols, merges, etc] each of the Player"i" dataframes, which are necessarily not equal in length , in my lists are such a way that I get a combined/merged set like the one you saw below the merged(player1, player2) set?

Here is the diagram again, but it would have to be for many players, not just Carlsen and Nakmura.

Also, is there a way I can avoid using the list function, and just straight up do

names(Player"i") [names(Player"i") == 'Rating'] <- eval(unique(Timed_set_filtered$Name)[i])

which just renames the titles of all of the dataframes that start with "Player".

merge(player1, player2, player3,...., player99, player100, by = c("YEAR"), all = TRUE)

which would merge all of the "Player""i" datasets?

If anything is unclear, please mention it.

Its great that you have provided an example, but it is a bit on the long / confusing side. Perhaps try and reduce it to a minimum example that shows the problem and expected outcome. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example gives some pointers — user20650, Feb 21 '16 at 22:27
An easy way would be to just use `dplyr`'s `rbind_list` to create a dataframe with all rows stacked on top of each other, assuming that the columns are meaningfully identically (see also `bind_rows)` and then use `reshape2`'s `dcast` to reshape the data. — coffeinjunky, Feb 21 '16 at 22:27
You could get a list of all the data frames with player in the name, using `lst <- mget(ls(pattern="Player"))` and then merge them using http://stackoverflow.com/questions/8091303/simultaneously-merge-multiple-data-frames-in-a-list?answertab=votes#tab-top (sorry if i misunderstand) — user20650, Feb 21 '16 at 22:28
I apologize about its length, but it is very dificult for me to describe this problem without explaining it very in depth, thanks for the link! — InfiniteFlash, Feb 21 '16 at 22:31
Basically if I understand correct, you want to merge all the Players into 1 dataframe (their names being headers) and another column Year? — CuriousBeing, Feb 21 '16 at 23:08
Yes, that seems about right. Players may have not played before an artbitrary year, so they have different legnths (due to not playing before say the year 2002, no data for 2000 and 2001). — InfiniteFlash, Feb 21 '16 at 23:17
Unfortunately, with the reduced(merge) code, it works nicely, BUT it duplicates the Name, and Rating columns from Rating, Rating, Rating, to Rating.x, Rating.y, Rating.z, and ill have to figure out how to rename those to just the player names — InfiniteFlash, Feb 24 '16 at 18:43

score 0 · Answer 1 · answered Feb 24 '16 at 18:37

It was pretty funny that one line of code did the trick. After I assigned all of the Player1, Player 2....Player i into the list, I just joined all of the sets contained in the list by Year.

For loop that generates all of unique datasets.

for (i in 1:nrow(as.data.frame(unique(Timed_set_filtered$Name)))) {
  assign(paste("Player",i,sep=""), subset(Timed_set_filtered, Name == unique(Timed_set_filtered$Name)[i]))
}

Puts them into a list

 lst <- mget(ls(pattern='^Player\\d+'))

Merge, or join by common value

df <- join_all(lst, by = 'Year')

Unfortunately, unlike merge(datasets...., all= TRUE), it drops certain observations for an unknown reason, will have to see why this happens.

Many dataframes, different row lengths, similiar columns and dataframe titles, how to bind?

1 Answers1

Linked