1

I'm new with R and I need to solve this problem, I have a dataframe with column names that have this pattern:

#Example
¦ 1.1 ¦ 1.2 ¦ 1.3 ¦ 2.1 ¦ 2.2 ¦ 2.3 ¦ 3.1 ¦ 3.2 ¦ 3.3 ¦

How can I delete all the columns in the dataframe whose names have this condition:

#Suppose x.y colname
(x.y) 
if x>y => delete column 

After:

¦ 1.1 ¦ 1.2 ¦ 1.3 ¦ 2.2 ¦ 2.3 ¦ 3.3 ¦

Here's the output of dput(head(x)) where x is my df: enter image description here "Código UNU" is just an ID

I tried with grep but I couldn't do it. All help will be welcome and grateful!

importm
  • 305
  • 2
  • 10
  • 3
    R *really* doesn't like names like that, are you sure they look exactly like that? Please include the output from `dput(head(x))` (where `x` is your `data.frame`) to be sure. Otherwise, *"I tried with grep"* is a good start but it would be even more helpful if you included what you tried. – r2evans Aug 20 '19 at 21:26
  • 1
    Tried with `grep` how? [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data, all necessary code, and a clear explanation of what you're trying to do and what hasn't worked. Also you say "delete column" but you tagged "delete row" – camille Aug 20 '19 at 21:29
  • Paste as text please, not image. – zx8754 Aug 20 '19 at 21:40

4 Answers4

5

You can use base R to index the columns based on this criteria:

names = as.character(c(1.1, 1.2, 1.3, 2.1, 2.2, 2.3, 3.1, 3.2, 3.3))

df = setNames(rnorm(n = length(names)), names)

df
#        1.1        1.2        1.3        2.1        2.2        2.3        3.1        3.2        3.3 
# -0.4685751 -0.1085529 -0.5613519 -1.0906374  1.0530686 -0.8101930 -0.6015732  1.3895373 -0.6977108 

wrangle <- function(x) {
  list <- strsplit(x, split = "\\.")
  left <- list[[1]][1]
  right <- list[[1]][2]
  return(left <= right) #will be TRUE if desired criteria is met
}

df[unlist(lapply(names, wrangle))] #index using the T/F vector
#        1.1        1.2        1.3        2.2        2.3        3.3 
# -0.9873006  0.6089725  0.2823161  0.3397318 -0.3136084  0.2270087
Dij
  • 1,318
  • 1
  • 7
  • 13
3

Here is a trick if your column names are not too much:

pattern <- unlist(sapply(1:10, function(i) paste0(i,'.',i:10)))

df[names(df) %in% pattern]

That is, you create a pattern that meets your condition, then filter out those columns not in your pattern.

989
  • 12,579
  • 5
  • 31
  • 53
1

Using tidyverse, reshape wide-to-long, string split, filter rows, then reshape back to long-to-wide:

# example data
df = setNames(data.frame(matrix(1:9, nrow = 1)), 
              as.character(c(1.1, 1.2, 1.3, 2.1, 2.2, 2.3, 3.1, 3.2, 3.3)))

df
#   1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3
# 1   1   2   3   4   5   6   7   8   9

library(tidyverse)

gather(df) %>% 
  separate(key, into = c("x", "y"), remove = FALSE, convert = TRUE) %>% 
  filter(x <= y) %>% 
  select(-c(x, y)) %>% 
  spread(key = "key", value = "value")

#   1.1 1.2 1.3 2.2 2.3 3.3
# 1   1   2   3   5   6   9
zx8754
  • 52,746
  • 12
  • 114
  • 209
1

In base R, using sub, we can compare the number before decimal with the one after that and select the columns. Using @zx8754's data

df[as.integer(sub("\\..*", "",names(df)))<= as.integer(sub(".*\\.", "", names(df)))]

#  1.1 1.2 1.3 2.2 2.3 3.3
#1   1   2   3   5   6   9
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213