0

i want to select columns from a data frame by name. the names of the columns are in a separate list, but the names in the list and the names of the columns are not exactly the same.

so here's my code:

list.of.names <- c('Var_1', 'Var_2')

But the column names are like this 'Var.1', Var.2'

I tried it with this:

new.df <- old.df %>% select(c(list.of.names))

Is there any function that does not distinguish between '.' and '_'?

Thanks for the help!

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
maduba
  • 1
  • 1
  • 4
    Maybe `select(starts_with('Var'))`? – Rui Barradas Oct 14 '22 at 15:14
  • 1
    There's not really a comparison function that treats `.` and `_` the same. You can use something like `gsub()` to replace all the `_` and with `.` in a vector if you like. You need to make sure the names match exactly. – MrFlick Oct 14 '22 at 15:14
  • @MrFlick No need to modify the names, just use `grep` for comparison. It's basically the same, just cutting out one unnecessary intermediate step. – Konrad Rudolph Oct 14 '22 at 15:15
  • 1
    Or you could turn your names into regex patterns... `select(matches("Var[-.]\\d"))`, you can be as specific or as general as you want to be. – Gregor Thomas Oct 14 '22 at 15:16
  • i would simply either rename the names of the dataframe or the list using gsub. type ?gsub on your console to see how this function is used – Dimitrios Zacharatos Oct 14 '22 at 15:22

1 Answers1

1

If I understand right you need the names in the list to match the column names exactly. AS MrFlick said, use gsub to replace the _ with .

gsub("_", ".", list.of.names) should work

However, I would recommend not using periods in column names or anything other than functions because it is confusing to keep things separate. I always use _ in variable names and I think it is maybe standard practice. Others can correct me if I am wrong.

This discussion gives more details. Replace specific characters within strings