2

I would like to extract all text containing numbers, numbers like "US6184521-B1" and "US3967255-A", in the following string:

US6184521-B1 -- US3967255-A   DELPHIAN FOUNDATION (DELP-Non-standard);  Q2 CORP (QTWO-Non-standard)   OLIVER S M,  PROUD R A,  PARSONS S J;  US3973118-A   LAMONTAGNE J A (LAMO-Individual)   LAMONTAGNE J A;  US4303855-A   IBM CORP (IBMC)   BAPST U H,  GFELLER F,  VETTIGER P;  US4394572-A   BIOX TECH INC (BIOX-Non-standard)   WILBER S;  US4407290-A   BIOX TECH INC (BIOX-Non-standard);  BOC GROUP PLC (BRTO)   WILBER S A;  US4633087-A   TREBOR INDS INC (TREB-Non-standard)   ROSENTHAL G K,  STEPHENS J D,  ROSENTHAL R D;  US4678921-A   NIPPONDENSO CO LTD (NPDE)   NAKAMURA T,  SATO S,  HATTORI T,  NABETA T,  KATO M;  US4864126-A   HEWLETT-PACKARD CO (HEWP)   WALTERS M D,  PERYESZI J,  PETRILLA J F,  PERNYESZI J;  US4865038-A   NOVAMETRIX MED SYST INC (NOVA-Non-standard)   RICH D,  THOMAS S;  US4907594-A   NICOLAY GMBH (NICO-Non-standard)   MUZ E;  US4939375-A   HEWLETT-PACKARD CO (HEWP)   WALTERS M D,  PERNYESZI J,  PETRILLA J F;  US5036437-A   LECTRON PRODUCTS IN (LECT-Non-standard)   MACKS H R;  US5209230-A   NELLCOR INC (NELL-Non-standard)   SWEDLOW D B,  WARING J,  DELONZO R;  US5237994-A   SQUARE ONE TECHNOLOGY (SQUA-Non-standard)   GOLDBERGER D S;  US5239169-A   MICROSCAN SYSTEMS INC (MICR-Non-standard)   THOMAS J E;  US5325192-A   TEKTRONIX INC (TEKT)   ALLEN D W;  US5373102-A   US SEC OF ARMY (USSA)   DAVENPORT W E,  EHRLICH J J,  TAYLOR T S;  US5561295-A   LITTON SYSTEMS INC (LITO)   PREIS M K,  JACKSEN N F;  US5629517-A   XEROX CORP (XERO)   JACKSON W B,  BIEGELSEN D K,  STREET R A,  WEISFIELD R L;  US5752914-A   NELLCOR PURITAN BENNETT INC (MLCW)   DELONZOR R,  NAMY A;  US5786592-A   HOEK INSTR AB (HOEK-Non-standard)   HOEK B

This should be similar to what is shown here but I want to extract both numbers and character letters. How can I achieve this in R?

Amleto
  • 584
  • 1
  • 7
  • 25

2 Answers2

2

Try this:

  test<-c("aa1","aaa")
  test[grepl("[1-9]", test)]
[1] "aa1"

With your data:

input<-"US6184521-B1 -- US3967255-A   DELPHIAN FOUNDATION (DELP-Non-standard);  Q2 CORP (QTWO-Non-standard)   OLIVER S M,  PROUD R A,  PARSONS S J;  US3973118-A   LAMONTAGNE J A (LAMO-Individual)   LAMONTAGNE J A;  US4303855-A   IBM CORP (IBMC)   BAPST U H,  GFELLER F,  VETTIGER P;  US4394572-A   BIOX TECH INC (BIOX-Non-standard)   WILBER S;  US4407290-A   BIOX TECH INC (BIOX-Non-standard);  BOC GROUP PLC (BRTO)   WILBER S A;  US4633087-A   TREBOR INDS INC (TREB-Non-standard)   ROSENTHAL G K,  STEPHENS J D,  ROSENTHAL R D;  US4678921-A   NIPPONDENSO CO LTD (NPDE)   NAKAMURA T,  SATO S,  HATTORI T,  NABETA T,  KATO M;  US4864126-A   HEWLETT-PACKARD CO (HEWP)   WALTERS M D,  PERYESZI J,  PETRILLA J F,  PERNYESZI J;  US4865038-A   NOVAMETRIX MED SYST INC (NOVA-Non-standard)   RICH D,  THOMAS S;  US4907594-A   NICOLAY GMBH (NICO-Non-standard)   MUZ E;  US4939375-A   HEWLETT-PACKARD CO (HEWP)   WALTERS M D,  PERNYESZI J,  PETRILLA J F;  US5036437-A   LECTRON PRODUCTS IN (LECT-Non-standard)   MACKS H R;  US5209230-A   NELLCOR INC (NELL-Non-standard)   SWEDLOW D B,  WARING J,  DELONZO R;  US5237994-A   SQUARE ONE TECHNOLOGY (SQUA-Non-standard)   GOLDBERGER D S;  US5239169-A   MICROSCAN SYSTEMS INC (MICR-Non-standard)   THOMAS J E;  US5325192-A   TEKTRONIX INC (TEKT)   ALLEN D W;  US5373102-A   US SEC OF ARMY (USSA)   DAVENPORT W E,  EHRLICH J J,  TAYLOR T S;  US5561295-A   LITTON SYSTEMS INC (LITO)   PREIS M K,  JACKSEN N F;  US5629517-A   XEROX CORP (XERO)   JACKSON W B,  BIEGELSEN D K,  STREET R A,  WEISFIELD R L;  US5752914-A   NELLCOR PURITAN BENNETT INC (MLCW)   DELONZOR R,  NAMY A;  US5786592-A   HOEK INSTR AB (HOEK-Non-standard)   HOEK B"
  input<-unlist(strsplit(input,split=" "))

Your output:

input[grepl("[1-9]", input)]
 [1] "US6184521-B1" "US3967255-A"  "Q2"           "US3973118-A"  "US4303855-A"  "US4394572-A"  "US4407290-A" 
 [8] "US4633087-A"  "US4678921-A"  "US4864126-A"  "US4865038-A"  "US4907594-A"  "US4939375-A"  "US5036437-A" 
[15] "US5209230-A"  "US5237994-A"  "US5239169-A"  "US5325192-A"  "US5373102-A"  "US5561295-A"  "US5629517-A" 
[22] "US5752914-A"  "US5786592-A"
Terru_theTerror
  • 4,918
  • 2
  • 20
  • 39
1

A simple grep will do it. Note the argument value set to TRUE, its default is FALSE.

grep("[[:digit:]]", s, value = TRUE)
# [1] "US6184521-B1" "US3967255-A"  "Q2"           "US3973118-A"  "US4303855-A" 
# [6] "US4394572-A"  "US4407290-A"  "US4633087-A"  "US4678921-A"  "US4864126-A" 
#[11] "US4865038-A"  "US4907594-A"  "US4939375-A"  "US5036437-A"  "US5209230-A" 
#[16] "US5237994-A"  "US5239169-A"  "US5325192-A"  "US5373102-A"  "US5561295-A" 
#[21] "US5629517-A"  "US5752914-A"  "US5786592-A"

DATA.
The following reads in the data you have provided using scan. It separates the strings by blank spaces, so your strings are probably different. But this is just to test the code above.

s <- 
scan(what = character(),
text = "US6184521-B1 -- US3967255-A   DELPHIAN FOUNDATION (DELP-Non-standard);
  Q2 CORP (QTWO-Non-standard)   OLIVER S M,  PROUD R A,  PARSONS S J;  
US3973118-A   LAMONTAGNE J A (LAMO-Individual)   LAMONTAGNE J A;  US4303855-A   
IBM CORP (IBMC)   BAPST U H,  GFELLER F,  VETTIGER P;  US4394572-A   BIOX TECH INC 
(BIOX-Non-standard)   WILBER S;  US4407290-A   BIOX TECH INC (BIOX-Non-standard);  
BOC GROUP PLC (BRTO)   WILBER S A;  US4633087-A   TREBOR INDS INC (TREB-Non-standard)   
ROSENTHAL G K,  STEPHENS J D,  ROSENTHAL R D;  US4678921-A   NIPPONDENSO CO LTD 
(NPDE)   NAKAMURA T,  SATO S,  HATTORI T,  NABETA T,  KATO M;  US4864126-A   
HEWLETT-PACKARD CO (HEWP)   WALTERS M D,  PERYESZI J,  PETRILLA J F,  PERNYESZI 
J;  US4865038-A   NOVAMETRIX MED SYST INC (NOVA-Non-standard)   RICH D,  
THOMAS S;  US4907594-A   NICOLAY GMBH (NICO-Non-standard)   MUZ E;  
US4939375-A   HEWLETT-PACKARD CO (HEWP)   WALTERS M D,  PERNYESZI J,  
PETRILLA J F;  US5036437-A   LECTRON PRODUCTS IN (LECT-Non-standard)   
MACKS H R;  US5209230-A   NELLCOR INC (NELL-Non-standard)   SWEDLOW D B,  
WARING J,  DELONZO R;  US5237994-A   SQUARE ONE TECHNOLOGY (SQUA-Non-standard)   
GOLDBERGER D S;  US5239169-A   MICROSCAN SYSTEMS INC (MICR-Non-standard)   
THOMAS J E;  US5325192-A   TEKTRONIX INC (TEKT)   ALLEN D W;  US5373102-A   
US SEC OF ARMY (USSA)   DAVENPORT W E,  EHRLICH J J,  TAYLOR T S;  
US5561295-A   LITTON SYSTEMS INC (LITO)   PREIS M K,  JACKSEN N F;  
US5629517-A   XEROX CORP (XERO)   JACKSON W B,  BIEGELSEN D K,  STREET R A,  
WEISFIELD R L;  US5752914-A   NELLCOR PURITAN BENNETT INC (MLCW)   
DELONZOR R,  NAMY A;  US5786592-A   HOEK INSTR AB (HOEK-Non-standard)   
HOEK B")
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66