6

I have a string which contain some data I parse from the web, and make a file named after this data.

string = urllib.urlopen("http://example.com").read()
f = open(path + "/" + string + ".txt")
f.write("abcdefg")
f.close()

The problem is that it may include one of this characters: \ / * ? : " < > |. I'm using Windows, and it is forbidden to use those characters in a filename. Also, string is in Unicode formar which makes most of the solutions useless.

So, my question is: what is the most efficient / pythonic way to strip those characters? Thanks in advance!

Edit: the filename is in Unicode format not str!

ohad987
  • 326
  • 2
  • 4
  • 15
  • 1
    http://stackoverflow.com/questions/1033424/how-to-remove-bad-path-characters-in-python – NPE Dec 25 '14 at 12:25
  • @NPE Sorry! I googled before but find nothing. Anyway, maybe there are better solutions so I'll keep it up – ohad987 Dec 25 '14 at 12:27

2 Answers2

16

we dont know how your data look like:

But you can use re.sub:

import re
your_string = re.sub(r'[\\/*?:"<>|]',"","your_string")
Hackaholic
  • 19,069
  • 5
  • 54
  • 72
11

The fastest way to do this is to use unicode.translate,

see unicode.translate.

In [31]: _unistr = u'sdfjkh,/.,we/.,132?.?.23490/,/' # any random string.

In [48]: remove_punctuation_map = dict((ord(char), None) for char in '\/*?:"<>|')

In [49]: _unistr.translate(remove_punctuation_map)Out[49]: 

u'sdfjkh,.,we.,132..23490,'

To remove all puctuation.

In [46]: remove_punctuation_map = dict((ord(char), None) for char in string.punctuation)

In [47]: _unistr.translate(remove_punctuation_map)
Out[47]: u'sdfjkhwe13223490'
Vishnu Upadhyay
  • 5,043
  • 1
  • 13
  • 24