0

I work with R but I haven't come across a case wherein I had to apply a single comparison operator for an entire dataframe. While comparing Pandas DataFrame and R dataframe, I could see the result of df[df > 0] is different in Python and R.

In Python the result of df[df > 0] is another DataFrame whereas in R, the result is a vector.

Python Code:

from numpy.random import randn
np.random.seed(101)
df = pd.DataFrame(randn(5,5), ['A', 'B', 'C', 'D', 'E'], ['V', 'W', 'X' , 'Y', 'Z'])

df[df > 0]

        V           W           X           Y           Z
A   2.706849839 0.628132709 0.907969446 0.503825754 0.651117948
B   NaN         NaN         0.605965349 NaN         0.740122057
C   0.528813494 NaN         0.188695309 NaN         NaN
D   0.955056509 0.190794322 1.978757324 2.60596728  0.683508886
E   0.302665449 1.693722925 NaN         NaN         NaN

R Code:

> set.seed(101)
> df = data.frame(matrix(rnorm(25), 5, 5))
> df
          X1         X2         X3         X4         X5
1 -0.3260365  1.1739663  0.5264481 -0.1933380 -0.1637557
2  0.5524619  0.6187899 -0.7948444 -0.8497547  0.7085221
3 -0.6749438 -0.1127343  1.4277555  0.0584655 -0.2679805
4  0.2143595  0.9170283 -1.4668197 -0.8176704 -1.4639218
5  0.3107692 -0.2232594 -0.2366834 -2.0503078  0.7444358
> df[df > 0]
 [1] 0.5524619 0.2143595 0.3107692 1.1739663 0.6187899 0.9170283 0.5264481 1.4277555 0.0584655 0.7085221 0.7444358
> 

Could someone let me know what is the significance of the way in which R and Python outputs the result. Also, in R is there a way to get a dataframe as a result for the command df[df > 0]

Karthik S
  • 11,348
  • 2
  • 11
  • 25
  • very weird question. What exactly do you need with df[df>0] in both R or python? In R it returns a vector because these are the elements in the data.frame that are > 0 – StupidWolf Jun 26 '20 at 08:10

1 Answers1

3

I am not clear on the "significance" part but if you want the same output as Python in R, you can assign the numbers which are less than equal to 0 as NaN.

set.seed(101)
df = data.frame(matrix(rnorm(25), 5, 5))
df[df <= 0] <- NaN
df

#         X1        X2        X3        X4        X5
#1       NaN 1.1739663 0.5264481       NaN       NaN
#2 0.5524619 0.6187899       NaN       NaN 0.7085221
#3       NaN       NaN 1.4277555 0.0584655       NaN
#4 0.2143595 0.9170283       NaN       NaN       NaN
#5 0.3107692       NaN       NaN       NaN 0.7444358
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • By significance, I just wanted to understand why R outputs a vector instead of a dataframe. – Karthik S Jun 26 '20 at 08:17
  • 3
    Why do you think it should return a dataframe? because Python does it? For me returning a vector seems more natural (maybe because I use R a lot) than turning the values to `NaN` implicitly. For R, most of "why" is answered in `?Extract` – Ronak Shah Jun 26 '20 at 08:24