R naming convention/tricks for many columns in data.table / data.frame

Question

I have a list of, say, n=10 data.tables (or data.frames).

Performing names(myList) returns the unique table names.

Performing names(myList[[i]]) (for i in 1:n) returns identical output for each value of i - i.e. each data.table has identical column names.

I need to merge all the data tables into one large data table, but would like to preserve the name of the list data.table for each column somehow, in order to keep an overview of where each column originated from.

Is there a trick to doing this, such as giving the columns keys? Or must one just prepend the table name to each of the columns in the final result? This would make the names pretty long in my case.

I want to avoid having to remember (or think about) which columns belongs to which table. Just for comparisons sake, I'd like to run str(myBigTable) or summary(myBigTable) and see something like Excel shows here [but vertically displayed in R]:

You can use the `idcol` parameter of `rbindlist` (*data.table*) or the `.id` parameter of `bind_rows` (*dplyr*) which will create an id for each separate dataset in the combined dataset. [See here for an example](http://stackoverflow.com/questions/32888757/reading-multiple-files-into-r-best-practice/32888918#32888918) — Jaap, Nov 24 '15 at 12:45
@Jaap - This seems like a good way, however I do need the funcitonality of `merge()`, which I don't (believe I) get with `rbindlist` - if so I do not know how. With the merge function I have the option of adding suffixes, but would prefer your method if possible. I can preprocess to make rbindlist feasable, but was wondering if the newly created column from `idcol` (starting with a **.**) is ignored by analysis functions that I apply to the whole data.table subsequently? — n1k31t4, Nov 24 '15 at 17:06
Stumbled on this old question. This is very do-able, but the question is quite unclear. Please provide a short reproducible example with desired output (3 tables with 3 columns each and just a couple rows should be sufficient). — Gregor Thomas, Dec 08 '15 at 22:47
I ended up just appending the source of each block of 5 columns. So based on the picture above, I have `col1_table1, col2_table1, col3_table1, col4_table1, col5_table1, col1_table2, ...`. I don't think what I wanted is possible while maintaining the functionality, for example, of a data.table. For a good overview, lists can be used (or even environments, then viewed by `ls.str()`). But indexing columns of many data.tables **in** a list becomes tedious. — n1k31t4, Dec 08 '15 at 22:53

R naming convention/tricks for many columns in data.table / data.frame

0 Answers0