-3

I'm writing a fairly simple python program to find and download videos from a particular site. I would like to have my script name the file by using the page title except the page title contains various strings i would like remove for e.g.,

The title is: 
The Big Bang Theory S09E15 720p HDTV X264-DIMENSION

but the titles are not always consistent for e.g.,

The title is:
Triple 9 2016 READNFO HDRip AC3-EVO

How can I replace strings if they are present? Maybe create a list or dictionary of possible strings and if they are present then remove them (or replace with empty string)? I have tried and tried to find an answer but cannot find anything that helps my situation.

Basically if "HDTV", "HDRip", "720p", "X264", etc are present then replace them otherwise carry on?

Jamie Lindsey
  • 928
  • 14
  • 26
  • So what have you tried, and what exactly is the problem with it? You've used [tag:regex], have you *tried writing a regex?* – jonrsharpe May 25 '16 at 17:50
  • Basically it's find `HDTV|HDRip|720p|X264`, replace with nothing. –  May 25 '16 at 18:04
  • @Jack Refer the dupe, It solves your problem exactly. Also a related problem is [Censoring a text string using a dictionary and replacing words with Xs. Python](http://stackoverflow.com/q/16675634) – Bhargav Rao May 25 '16 at 19:34

4 Answers4

3

Simple example:

string = 'The Big Bang Theory S09E15 720p HDTV X264-DIMENSION'
dict = {'720p':'1080p'} # format 'substring':'replacement'

for key, value in dict.iteritems():
  if key in string:
    string.replace(key,value)          

The only problem with this is that if you want to replace a word that could be part of another word. For example if you want to replace 'an' with a, then the string in this example would become 'The Big Bag Theory ... '. To fix this I would try breaking up the string into a set of words and compare the words to dictionary entries.

Joshua Howard
  • 876
  • 1
  • 12
  • 25
2
for undesired_word in ("HDTV", "HDRip", "720p", "X264"):
    title = title.replace(undesired_word, "")
Kevin
  • 74,910
  • 12
  • 133
  • 166
  • It's too slow this way, use a regex with alternations and do it in one shot. –  May 25 '16 at 18:05
  • @Kevin could I do `dict = {'720p':'1080p'} for undesired_word in dict: title = title.replace(undesired_word, "")`, I' fairly new to 'non-basic' python so still learning. – Jamie Lindsey May 25 '16 at 18:21
0
title = 'The Big Bang Theory S09E15 720p HDTV X264-DIMENSION'

if 'HDTV' in title:
    title = title.replace('HDTV', '')

not very pythonic but it will do what you want

WildCard
  • 527
  • 3
  • 8
  • 2
    Not much point in having the conditional there. if HDTV isn't in the title, then `title = title.replace('HDTV', '')` doesn't do anything, so it's harmless to run it anyway. – Kevin May 25 '16 at 18:03
  • good to know, I figured title.replace would return traceback if the unwanted word wasnt present – WildCard May 25 '16 at 18:07
0

Kevins answer will work for you, but just in case you find yourself wanting to use a regex:

import re
string_to_replace = ["HDTV", "HDRip", "720p", "X264"]
regex_string = r"|".join(string_to_replace)
S = "The Big Bang Theory S09E15 720p HDTV X264-DIMENSION"
new_string = re.sub(regex_string, "", S, flags=re.I)
print(new_string)

prints:

The Big Bang Theory S09E15   -DIMENSION

Also, as you will notice the spaces that went after the strings you were replacing are still there, if you do not want that, you can change string_to_replace to include the spaces, like so: ["HDTV ", "HDRip ", "720p ", "X264 "] and this would result in the output being:

The Big Bang Theory S09E15 X264-DIMENSION
Chrispresso
  • 3,660
  • 2
  • 19
  • 31
  • @ ZWiki thanks that seems like my preference so far but to evaluate what does "|" represent please? – Jamie Lindsey May 25 '16 at 18:26
  • @JackHerer, oh sorry. The "|" is like a logical OR. This joins the string so that it would produce "HDTV|HDRip|720p|x264" which in a regex means if it matches any of those strings, replace it with nothing. The `flags=re.I` is also ignore case, which you can get rid of if you want it to be `case sensitive` – Chrispresso May 25 '16 at 18:28
  • @ ZWiki ..excellent explanation, cheers, and I am replacing spaces with hyphens after replacing undesired strings – Jamie Lindsey May 25 '16 at 18:40