-4

I have a dataframe with a column like this

 Id   Comment 
 1     \u009cYes yes for ever for ever the boys cried in their ringing voices   with softened faces on 02/14/2016
 2     \u009cYes yes for ever for ever the cried in their ringing voices with softened faces on 01/14/2010
 3     \u009cYes yes for ever for ever t 12/04/2003
 4     \u009c for ever for ever ringing voices  07/02/2002
 5     \u009c for ever for ever ringing softened faces  07/09/2001

How do I use gsub to replace all the characters but ringing and 02/14/2016

The final column should be like this

Id    Comment
1     ringing 02/14/2016
2     ringing 01/14/2010
3             12/04/2003
4     ringing 07/02/2002
5     ringing 07/09/2001

-----Updated question based on comments from G. Grothendieck, Frank and Dason

Heather Keturah
  • 139
  • 1
  • 9
  • 1
    ... how are you determining that those are the results you want? – Dason Feb 17 '16 at 18:21
  • 1
    I guess you need at least two example strings to illustrate the nature of the problem. You could just do `y = "ringing 02/14/2016"` if you just have this single string and know exactly what you need to extract from it. – Frank Feb 17 '16 at 18:21
  • @Frank, i have a column with bunch rows full of garbage, i just want to keep two things, one is a word (ringing ) and second is anything thats a number or date like...with dot or a slash (`/`)or a dash – Heather Keturah Feb 17 '16 at 18:23
  • @Dason, same thing I said Frank above – Heather Keturah Feb 17 '16 at 18:24
  • Ok, but for us to get an idea of what "a work" is and what the full range of date formats might be, we'll need a more extensive example. If you're not familiar with how regexes work, you might want to take a look at some other folks' examples before forming your questions and answers, like http://stackoverflow.com/q/2192316/1191259 , or at some of the documentation. – Frank Feb 17 '16 at 18:26
  • 3
    If you just want to know which rows have ringing and which rows have a date in the indicated form then `transform(DF, has_ringing = grepl("ringing", colX), has_date = grepl("../../....", colX))` assuming `DF` is the data frame and `colX` is the column in question. `regmatches` in R, `strapplyc` in gsubfn and also certain functions in stringr can extract matched strings. – G. Grothendieck Feb 17 '16 at 18:26
  • @G.Grothendieck , I updated my question for what its worth, i tried your suggestion it worked but it created new rows which is not what i wanted – Heather Keturah Feb 17 '16 at 18:43
  • Can you use `dput` to share the data? – Tyler Rinker Feb 17 '16 at 18:45
  • @TylerRinker, one cell has more than 92kb of text, – Heather Keturah Feb 17 '16 at 18:47
  • No for the data you showed (5 rows). It's a pain to read in data that has spaces from the data.frame display. `dput` quotes everything nicely.\ – Tyler Rinker Feb 17 '16 at 18:48
  • For item 3, there is not "ringing" word, that seems to be added to the output though. Is this what you want (add "ringing" it is doesn't exist) ? – steveb Feb 17 '16 at 18:56
  • @steveb , no ringing on the 3rd ob you are right :) fixed it – Heather Keturah Feb 17 '16 at 18:58
  • @HeatherKeturah As TylerRinker suggested, can you include the output of `dput` on your input data, that will make it easier for others to load the data (i.e. it is just cut and paste). – steveb Feb 17 '16 at 19:17
  • @Heather. What I wrote does not create new rows for me. Maybe you did something else? – G. Grothendieck Feb 17 '16 at 21:45
  • I downvoted for failure to supply data in an easily readable format. Request was ignored. It forced @G.Grothendieck to put the data in a readable format in his answer. When your data has spaces in the cells themselves (this is typical with text data) it can not be easily read in. This puts the burden on those helping you which seems unfair. The poster should bear this burden to make their post reproducible. – Tyler Rinker Feb 19 '16 at 13:55

2 Answers2

1

What about:

df <- read.table(text="Id,Comment 
1,\u009cYes yes for ever for ever the boys cried in their ringing voices   with softened faces on 02/14/2016
2,\u009cYes yes for ever for ever the cried in their ringing voices with softened faces on 01/14/2010
3,\u009cYes yes for ever for ever t 12/04/2003
4,\u009c for ever for ever ringing voices  07/02/2002
5,\u009c for ever for ever ringing softened faces  07/09/2001", header=T, sep=",")

df$ringing <- ''
df[grep("ringing", df$Comment), 'ringing'] <- 'ringing'
df[grep("../../..", df$Comment), 'date'] <- regmatches(df$Comment,regexpr("../../..", df$Comment))
df$res <- paste(df$ringing, df$date)
HubertL
  • 19,246
  • 3
  • 32
  • 51
1

You could use dplyr as follows. There is likely a better way to handle the regex though (i.e. w/o needing the paste). This assumes the data is in df already.

library(dplyr)
df %>%
    mutate(Comment = paste0( ifelse(grepl('ringing', Comment), 'ringing ', ''),
                             gsub('^.*(\\d{2}/\\d{2}/\\d{4}).*', '\\1', Comment)))
steveb
  • 5,382
  • 2
  • 27
  • 36