I am trying to remove urls that may or may not start with http/https from a large text file, which I saved in urldoc in R. The url may start like tinyurl.com/ydyzzlkk or aclj.us/2y6dQKw or pic.twitter.com/ZH08wej40K. Basically I want to remove data before a '/' after finding the space and after a "/" until I find a space. I tried with many patterns and searched many places. Couldn't complete the task. I would help me a lot if you could give some input.
This is the last statement I tried and got stuck for the above problem. urldoc = gsub("?[a-z]+\..\/.[\s]$","", urldoc)
Input would be: A disgrace to his profession. pic.twitter.com/ZH08wej40K In a major victory for religious liberty, the Admin. has eviscerated institution continuing this path. goo.gl/YmNELW nothing like the admin. proposal: tinyurl.com/ydyzzlkk
Output I am expecting is: A disgrace to his profession. In a major victory for religious liberty, the Admin. has eviscerated institution continuing this path. nothing like the admin. proposal:
Thanks.