Replacing string data frame

Question

I have a file like this

1880.1.1    74
1881.1.1    74
1882.1.1    75
1883.1.1    79
1884.1.1    111
1885.1.1    145

and I want to create a dataframe like this

1880    1    1  74
1881    1    1  74
1882    1    1  75
1883    1    1  79
1884    1    1  111
1885    1    1  145

but when I try with the gsub function I fail.. Many many thanks!

You have to escape the period, try out: `gsub("\\."," ","1880.1.1")` — David, Sep 09 '13 at 14:51
Since you didn't show us how your `gsub` is failing, I'm going to guess you aren't escaping the `.`. It should look like `gsub('\\.', ...)` However, I don't think `gsub` is the function you want. Instead, look at `strsplit` and please share more of the code that you have tried. — Justin, Sep 09 '13 at 14:52

score 5 · Accepted Answer · answered Sep 09 '13 at 14:56

You can use concat.split from my "splitstackshape" package for a more convenient way to do what you're trying to do. Assuming your data.frame is called "mydf" and the first column is called "V1", you can do:

> library(splitstackshape)
> concat.split(mydf, "V1", sep = ".", drop = TRUE)
   V2 V1_1 V1_2 V1_3
1  74 1880    1    1
2  74 1881    1    1
3  75 1882    1    1
4  79 1883    1    1
5 111 1884    1    1
6 145 1885    1    1

Here, "mydf" is defined as:

mydf <- structure(list(V1 = c("1880.1.1", "1881.1.1", "1882.1.1", "1883.1.1", 
  "1884.1.1", "1885.1.1"), V2 = c(74L, 74L, 75L, 79L, 111L, 145L)), 
  .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, -6L))

The equivalent in base R is to use something like the following:

> cbind(read.table(text = as.character(mydf$V1), sep = "."), mydf[-1])
    V1 V2 V3  V2
1 1880  1  1  74
2 1881  1  1  74
3 1882  1  1  75
4 1883  1  1  79
5 1884  1  1 111
6 1885  1  1 145

Jilber Urbina · Answer 2 · 2013-09-09T15:13:46.330

2

Although Anandas' R base solution is the simplier and nicer, here's another approach using strsplit

> data.frame(t(sapply(strsplit(mydf[,"V1"], "\\." ), as.numeric)), X4=mydf[, "V2"])
    X1 X2 X3  X4
1 1880  1  1  74
2 1881  1  1  74
3 1882  1  1  75
4 1883  1  1  79
5 1884  1  1 111
6 1885  1  1 145

edited Sep 09 '13 at 15:13

answered Sep 09 '13 at 15:05

Jilber Urbina

58,147
10
114
138

I did not know `as.numeric` would coerce the data to a matrix. Thanks for the lesson! – dayne Sep 09 '13 at 15:12
1

@dayne, it's not the `as.numeric` that's coercing to a `matrix`. You can have almost anything there that won't change the values (`c`, `as.vector`, ...). It's just that `sapply` will simplify to a `matrix` whenever possible (as it was in this case). – A5C1D2H2I1M1N2O1R2T1 Sep 09 '13 at 15:17
@AnandaMahto Thanks! I really should have known that. In this case is `sapply` or `mapply` more appropriate? They both seem to behave identically, using either the `as.numeric` or `cbind`/`rbind` approach. – dayne Sep 09 '13 at 15:24
@dayne, Not sure, really. Probably depends on how you define "more appropriate" :). I don't know which function is more efficient. I haven't used `mapply` much. – A5C1D2H2I1M1N2O1R2T1 Sep 09 '13 at 15:27

score 1 · Answer 3 · answered Sep 09 '13 at 15:05

Here is a strsplit approach. I used @Ananda's data.

> data.frame(t(mapply(cbind,strsplit(mydf[,1],split='[:.:]'))),mydf[,2])
    X1 X2 X3 mydf...2.
1 1880  1  1        74
2 1881  1  1        74
3 1882  1  1        75
4 1883  1  1        79
5 1884  1  1       111
6 1885  1  1       145

Replacing string data frame

3 Answers3

Linked