I've got lots of data and one of the columns is a free text description. I'm trying to process this in SAS and as part of this I want to correct some spelling and remove some words that don't really add any value to what the text is saying.
I've noticed there's a quite a few googlemaps links that have just been copied into quite a few of these descriptions. And I'm trying to remove all of them.
I've got ways of removing complete words and phrases I define, but all these googlemaps links are slightly different so is there a way of removing all the different instances of these types of links? In the example below I've put three different ways that the google maps links have been copied into my data:
- www.google.co.uk/maps/@51.34735456-2.9327
- https://goo.gl/maps/jFh9RXXm
- https://www.google.com.br/maps/place/Howard+Rd
So is there a way for example of removing just the characters starting from "https://goo", "https://www.goo" and/or "www.goo" all the way up to the next space? And then replacing that with the word "googlemapslink"? Or a way of removing the entire string bound by spaces which contains the string "/maps/"?
Any thoughts would be greatly appreciated :)
Code below (which works, but isn't really practical as I'll have to go through the whole data to first get a list of all various forms of the google maps links):
data have;
infile datalines dsd truncover;
input ID Description :$500. Col3 $ Col4 Col5 Col6;
datalines;
1,bla bla lay bye my mybla,C1,0,100,0
2,got laybye me tear,C1,0,0,0
3,free mug text i google by,C1,10,100,0
4,house www.google.co.uk/maps/@51.34735456-2.9327 roof tree!?,C1,10,100,0
5,Mug house https://goo.gl/maps/jFh9RXXm mugg muggle,C1,10,0,0
6,mug sky** lay mug by by lay computer https://www.google.com.br/maps/place/Howard+Rd mug mug mugs,C3,0,20,1
;
/* change instances of google maps links to "googlemapslink"*/
data data_1;
set have;
Description_new = Description;
Description_new = tranwrd(Description_new," mug ", " cup ");
Description_new = tranwrd(Description_new," https://goo.gl/maps/jFh9RXXm ", " googlemapslink ");
Description_new = tranwrd(Description_new," https://www.google.com.br/maps/place/Howard+Rd ", " googlemapslink ");
Description_new = tranwrd(Description_new," www.google.co.uk/maps/@51.34735456-2.9327 ", " googlemapslink ");
run;