0

The following is the first few lines of my script:

library(tidyverse)
library(caret)

# Read and clean up the data
ugriz <- read.table(
    "F:/Users/Jeremy Moss/Dropbox/Astro at VUW/PhD stuff/data_files/optical_data/QSOs_1st_50k.dat-mags.dat")
ugriz[ugriz == -999] <- NA
fields <- c('name', 'z','delta_z','NED_class','SDSS_class','no_radio','radio_max','no_UV', 'UV_min',
    'u', 'g', 'r', 'i', 'z_mag', 'I', 'J', 'H', 'K', 'W1', 'SPIT_5', 'W2', 'SPIT_8', 'W3', 'W4', 'NUV', 'FUV')
names(ugriz) <- fields
mags <- fields[10:26]

head(ugriz)
ugriz['SDSSJ094736.54-005905.6',]

The output is

> head(ugriz)
                     name       z delta_z NED_class SDSS_class no_radio radio_max no_UV UV_min      u      g      r
1 SDSSJ094736.54-005905.6 0.65218 0.00013       QSO        QSO        0        12     7   14.8 20.078 19.679 19.585
2 SDSSJ094745.26-004113.2 2.83059 0.00061       QSO        QSO        0        12     4   14.8 21.468 19.695 19.343
3 SDSSJ094532.67-010003.3 3.03664 0.00037       QSO        QSO        0        12    10   14.8 20.443 19.115 18.918
4 SDSSJ094545.11-003921.6 1.47375 0.00044       QSO        QSO        0        12     9   14.8 19.723 19.593 19.353
5 SDSSJ094703.31+000228.9 1.77191 0.00170       QSO        QSO        0        12     7   14.8 19.683 19.393 19.273
6 SDSSJ094454.24-004330.3 2.28794 0.00032       QSO        QSO        0        12     7   14.8 19.769 19.233 19.083
       i  z_mag      I      J      H      K     W1 SPIT_5     W2 SPIT_8     W3    W4    NUV    FUV
1 19.406 19.370 19.023     NA     NA     NA 14.970     NA 13.992     NA 11.507 8.756 20.122 20.736
2 18.923 18.650 18.539 16.925 16.333 15.817     NA     NA     NA     NA     NA    NA     NA 20.511
3 18.749 18.698 18.365     NA     NA     NA 14.638     NA 14.041     NA 11.174 8.751 21.646 21.456
4 19.175 19.258 18.791 18.368 17.701 17.340 15.193     NA 14.354     NA 11.328    NA 21.122 21.090
5 18.995 18.950 18.611 18.148 17.831 17.233 15.545     NA 14.530     NA 11.573    NA 22.465 20.091
6 18.983 18.768 18.599 17.840 17.121 16.170 14.978     NA 14.224     NA 11.489    NA 21.684 20.314

> ugriz['SDSSJ094736.54-005905.6',]
   name  z delta_z NED_class SDSS_class no_radio radio_max no_UV UV_min  u  g  r  i z_mag  I  J  H  K W1 SPIT_5 W2 SPIT_8
NA <NA> NA      NA      <NA>       <NA>       NA        NA    NA     NA NA NA NA NA    NA NA NA NA NA NA     NA NA     NA
   W3 W4 NUV FUV
NA NA NA  NA  NA

head(ugriz) gives the expected output, but when I reference a particular row with ugriz['SDSSJ094736.54-005905.6',] (the first row, in this case, but it happens for all), I get all NAs. Why is that?

Jim421616
  • 1,434
  • 3
  • 22
  • 47

1 Answers1

2

You need to pass the row names when subsetting with ugriz[, ]. The row names can be seem with

rownames(ugriz)
# [1] "1" "2" "3" "4" ....

In this case your row names are "1", "2", "3", etc. So you could do ugriz["1", ]. In this case your row names are basically the same as your row index. This might change if you subset your data, for example. You will see that "row names" are not the same as having a column named name. If you want to subset by column value in base R, use

ugriz[ugriz$name=='SDSSJ094736.54-005905.6',]

or

subset(ugriz, name=='SDSSJ094736.54-005905.6')`
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • As far as I can see, I'm following the nomenclature as found at http://www.r-tutor.com/r-introduction/data-frame/data-frame-row-slice under "Numeric Indexing" then "Name indexing". What's the difference between there and what I'm doing? – Jim421616 Jun 29 '20 at 23:39
  • 1
    You don't have row names. You have a column named "name". Note how in that other page the row names column doesn't have a header/column name. You can set row names after import if you like with `rownames(ugriz)<-ugriz$name` . But if you plan on using any `tidyverse` functions, be aware that those functions ignore/drop row names, preferring all relevant data to be in a proper column. – MrFlick Jun 29 '20 at 23:43
  • 1
    Jim421616, don't confuse that page's sentence *"We can retrieve a row by its name"* with *"We can retrieve a row by `"name"`"*, where I'm using `code` formatting to suggest the name of a field in the frame. R doesn't check the frame for columns named `name`; if you try to index on a string (even if the string is `"1"`), it looks for `rownames(ugriz)` that match the string(s) you provide, it does not look for a column named `"name"`; if you try to index on a number, it `trunc`ates that number and gives you that row number, regardless of its row name or the presence of a column named `"name"`. – r2evans Jun 29 '20 at 23:46
  • And I second MrFlick's caution against row names: tidyverse packages not only make no attempt to save or use them, often there is the intentional removal of them (compare `rownames(mtcars)` and `rownames(as_tibble(mtcars))` to see this in action). If you need the row names, preserve them as a column in the data, perhaps with `tibble::rownames_to_column` (or similar), otherwise don't count on them. – r2evans Jun 29 '20 at 23:48
  • Ah, and I can't use `rownames(ugriz)<-ugriz$name` because some of my rows are duplicates. I'll ask about merging them in a separate question. Thank you. – Jim421616 Jun 29 '20 at 23:49
  • That's good to know ... tread carefully with `dplyr` then. Good luck! – r2evans Jun 29 '20 at 23:52