Same reason for both: airquality["Ozone"]
returns a dataframe, whereas airquality$Ozone
returns a vector. class()
shows you their object types. str()
is also good for succinctly showing you an object.
See the help on the '[' operator, which is also known as 'extracting', or the function getElement()
. In R, you can call help()
on a special character or operator, just surround it with quotes: ?'['
or ?'$'
(In Python/C++/Java or most other languages we'd call this 'slicing').
As to why they print differently, print(obj)
in R dispatches under-the-hood an object-specific print method. In this case: print.dataframe
, which prints the dataframe column(s) vertically, with row-indices, vs print
(or print.default
) for a vector, which just prints the vector contents horizontally, with no indices.
Now back to extraction with the '[' vs '$' operators:
The most important distinction between ‘[’, ‘[[’ and ‘$’ is that the ‘[’ can select more than one element whereas the other two ’[[’ and ‘$’ select a single element.
There's also a '[[' extract syntax, which will do like '$' does in selecting a single element (vector):
airquality[["Ozone"]]
[1] 41 36 12 18
The difference between [["colname"]]
and $colname
is that in the former, the column-name can come from a variable, but in the latter, it must be a string. So [[varname]]
would allow you to index different columns depending on value of varname
.
Read the doc about the exact=TRUE
and drop=TRUE
options on extract()
. Note drop=TRUE
only works on arrays/matrices, not dataframes, where it's ignored:
airquality["Ozone", drop=TRUE]
In `[.data.frame`(airquality, "Ozone", drop = TRUE) :
'drop' argument will be ignored
It's all kinda confusing, offputting at first, eccentrically different and quirkily non-self-explanatory. But once you learn the syntax, it makes sense. Until then, it feels like hitting your head off a wall of symbols.
Please take a very brief skim of R-intro and R-lang#Indexing HTML or in PDF. Bookmark them and come back to them regularly. Read them on the bus or plane...
PS as @Henry mentioned, strictly when accessing a dataframe, we should insert a comma to disambiguate that the column-names get applied to columns, not rows: airquality[, "Ozone"]
. If we used numeric indices, airquality[,1]
and airquality[1]
both extract the Ozone column, whereas airquality[1,]
extracts the first row. R is applying some cleverness since usually strings aren't row-indices.
Anyway, it's all in the doc... not necessarily all contiguous or clearly-explained... welcome to R :-)