21

I am going to convert Python pandas dataframe to dataframe in R. I found out few libraries for this problem

http://pandas.pydata.org/pandas-docs/stable/r_interface.html

which is rpy2

But I couldn't find the methods for saving or transfer it to R.

Firstly I tried "to_csv"

df_R = com.convert_to_r_dataframe(df_total)
df_R.to_csv(direc+"/qap/detail_summary_R/"+"distance_"+str(gp_num)+".csv",sep = ",")

But it gives me an error

"AttributeError: 'DataFrame' object has no attribute 'to_csv'  "

So I tried to see its data type it was

<class 'rpy2.robjects.vectors.DataFrame'>

how could I save this type object to csv file or transfer to R?

JonghoKim
  • 1,965
  • 7
  • 21
  • 44

3 Answers3

20

If standard text-based formats (csv) are too slow or bulky, I'd recommend feather, a serialization format built on Apache Arrow. It was explicitly developed by the creators of RStudio/ggplot2/etc (Hadley Wickham) and pandas (Wes McKinney) for performance and interoperability between Python and R (see here).

You need pandas verson 0.20.0+, pip install feather-format, then you can use the to_feather/read_feather operations as drop-in replacements for to_csv/read_csv:

df_R.to_feather('filename.feather')
df_R = pd.read_feather('filename.feather')

The R equivalents (using the package feather) are

df <- feather::read_feather('filename.feather')
feather::write_feather(df, 'filename.feather')

Besides some minor tweaks (e.g. you can't save custom DataFrame indexes in feather, so you'll need to call df.reset_index() first), this is a fast and easy drop-in replacement for csv, pickle, etc.

EDIT: Today (Juni 2022) the feather development moved to arrow. It means don't use feather library but arrow.

library(arrow)
df <- arrow::read_feather('filename.feather')
buhtz
  • 10,774
  • 18
  • 76
  • 149
jayelm
  • 7,236
  • 5
  • 43
  • 61
  • 1
    As of March 2020, feather / pyarrow does not support dataframes wrapping sparse matrices. – Ben Whale Mar 03 '20 at 22:34
  • 1
    Does it need to write to disk, or it is possible to use it to serialize from python to binary or text (e.g. a field in a SQL server table) and deserialize to R? – dawid Mar 03 '22 at 14:31
18

The recent documentation https://rpy2.github.io/doc/v3.2.x/html/generated_rst/pandas.html has a section about interacting with pandas.

Otherwise objects of type rpy2.robjects.vectors.DataFrame have a method to_csvfile, not to_csv:

https://rpy2.github.io/doc/v3.2.x/html/vector.html#rpy2.robjects.vectors.DataFrame.to_csvfile

If wanting to pass data between Python and R, there are more efficient ways than writing and reading CSV files. Try the conversion system:

from rpy2.robjects import pandas2ri
pandas2ri.activate()

from rpy2.robjects.packages import importr

base = importr('base')
# call an R function on a Pandas DataFrame
base.summary(my_pandas_dataframe)
lgautier
  • 11,363
  • 29
  • 42
7

Once you have your data.frame you can save it using write.table or one of the wrappers of the latter, for example writee.csv.

In rpy2 :

import rpy2.robjects as robjects
## get a reference to the R function 
write_csv = robjects.r('write.csv')
## save 
write_csv(df_R,'filename.csv')
agstudy
  • 119,832
  • 17
  • 199
  • 261