1

I have a data frame that contains numbers and numbers separated by a "." and I want to change the entries dependent on the "." string. If the entry does not contain a "." the prefix "-" should be added. That's kind of simple using the subsetting or grep functionality. But I also want to replace the entries which contain a "." with the counter of ".".

my example data:

X1      X2 
1       2  
3       4
6       8
5       1.2
3.4     7
1.2.5   9
11      3.4.7

and I would like to have it look like this:

X1      X2 
-1       -2  
-3       -4
-6       -8
-5       1
2        -7
3        -9
-11      4

I have no clue and tried already subsetting, extracting the "." parts to count them. But I can not insert the counter. Thanks for your help.

Sotos
  • 51,121
  • 6
  • 32
  • 66
Miguel123
  • 93
  • 7
  • because it's the third and 4th time a "." appears – Miguel123 Nov 11 '16 at 14:18
  • Yes I got it. Look at my answer below – Sotos Nov 11 '16 at 14:19
  • yes, thanks! :) also a nice solution, although I'm not familiar with sapply. And according to your question - how would the code look like if we want to check the numbers of the "."-entry and replace it with the row-number where the combination appeared above? So that means: 1.2. => 1, 3.4 =>2, 1.2.5 => 4, 3.4.7 => 5 ? – Miguel123 Nov 11 '16 at 14:34
  • Here is [a link about apply family](http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega). I am not sure I understand what you mean with row-number – Sotos Nov 11 '16 at 14:38
  • if it is like this: Check for ".", if yes -> check where that numbers appeared already and replace the "."-entry by the number of the row where that pair of numbers appeared. – Miguel123 Nov 11 '16 at 14:44
  • 1
    @Sotos he means that 1.2 is the combination of X1 and X2 at the row 1, and 3.4.7 is the combination of X1 and X2 at the row5. – timat Nov 11 '16 at 14:46
  • oh, well...this really complicates things – Sotos Nov 11 '16 at 14:48
  • @Miguel123 I would say look for it, and if you do not find, post your attempt in another question. You can for instance create column with X1.X2 comb and X2.X1 comb, compare with your number and keep the row when they match – timat Nov 11 '16 at 14:50
  • can do something with `paste` maybe – Sotos Nov 11 '16 at 14:51
  • Edited my answer. Have a look – Sotos Nov 11 '16 at 15:23

2 Answers2

2

Here is an idea via base R,

ind <- rowSums(sapply(df, function(i) cumsum(grepl('\\.', i))))
df[] <- lapply(df[], function(i) ifelse(grepl('\\.', i), ind, paste0('-', i)))

df
#   X1 X2
#1  -1 -2
#2  -3 -4
#3  -6 -8
#4  -5  1
#5   2 -7
#6   3 -9
#7 -11  4

NOTE : I converted df to character,

df[] <- lapply(df[], as.character)

EDIT

Regarding your row numbers request, then this should do it,

ind1 <- apply(df, 1, function(i) paste(sort(i), collapse = '.'))
df2 <- sapply(df, function(i) match(i, ind1))
df[] <- lapply(df[], function(i) ifelse(grepl('\\.', i), 0, paste0('-', i)))
df[!is.na(df2)] <- df2[!is.na(df2)]
df
#   X1 X2
#1  -1 -2
#2  -3 -4
#3  -6 -8
#4  -5  1
#5   2 -7
#6   4 -9
#7 -11  5

If you are planning on doing calculations with this data frame later on, then you should convert to integer, i.e.,

df[] <- lapply(df[], as.integer)

str(df)
#'data.frame':  7 obs. of  2 variables:
# $ X1: int  -1 -3 -6 -5 2 4 -11
# $ X2: int  -2 -4 -8 1 -7 -9 5
Sotos
  • 51,121
  • 6
  • 32
  • 66
0

Here it is with data.table The idea is to create a counter in an temporary column:

library(data.table)

dt<-data.table(df)
dt$X1 <- as.character(dt$X1 )
dt$X2 <- as.character(dt$X2 )
dt[!grepl(".", dt$X1, fixed=TRUE),X1:=paste("-", X1, sep="") ]
dt[!grepl(".", dt$X2, fixed=TRUE),X2:=paste("-", X2, sep="") ]
dt[grepl(".", dt$X1, fixed=TRUE)|grepl(".", dt$X2, fixed=TRUE), count_point:=as.character(sequence(.N))]
dt[grepl(".", dt$X1, fixed=TRUE),X1:=count_point]
dt[grepl(".", dt$X2, fixed=TRUE),X2:=count_point]
df <- data.frame(dt[, c("X1", "X2"), with = FALSE])

There should be a way to do it in less line, using .SD

timat
  • 1,480
  • 13
  • 17