To understand my question, I should first point out that R datatables aren't just R dataframes with syntaxic sugar, there are important behavioral differences : column assignation/modification by reference in datatables avoids the copying of the whole object in memory (see the example in this quora answer) as it is the case in dataframes.
I've found on multiple occasions that the speed and memory differences that arise from data.table
's behavior is a crucial element that allows one to work with some big datasets while it wouldn't be possible with data.frame
's behavior.
Therefore, what I'm wondering is : in Python, how do Pandas
' dataframes behave in this regard ?
Bonus question : if Pandas' dataframes are closer to R dataframes than to R datatables, and have the same down side (a full copy of the object when assigning/modifying column), is there a Python equivalent to R's data.table
package ?
EDIT per comment request : Code examples :
R dataframes :
# renaming a column
colnames(mydataframe)[1] <- "new_column_name"
R datatables :
# renaming a column
library(data.table)
setnames(mydatatable, 'old_column_name', 'new_column_name')
In Pandas :
mydataframe.rename(columns = {'old_column_name': 'new_column_name'}, inplace=True)