1

I need to one give me the string between ~ and ^

i have string like this

:::ABC???,:::DEF???

I need to get the string between them with python

I want to do all this because i am trying to extract text from an html page. like this example

<td class="cell-1">
    <div><span class="value-frame">&nbsp;~ABC^,~DEF^</span></div>
</td>
Psidom
  • 209,562
  • 33
  • 339
  • 356
dragon
  • 81
  • 1
  • 8

2 Answers2

1

You can use the isalpha() function in a generator expression. Then combine the characters as a single string using join().

def extract_string(s):
    return ''.join(i for i in s if i.isalpha())

Sample output:

print extract_string(':::ABC???,:::DEF???')
>>> ABCDEF

However that is only for extracting all characters, if you want to extract only characters between ~...^:

import re
def extract_string(s):
    match = re.findall(r"~([a-zA-z]*)\^", s)
    return match

Sample output:

s = '&nbsp;~ABC^,~DEF^'
print extract_string(s)
>>> ['ABC', 'DEF']

Just a side note: if you're parsing HTML using regex and/or string manipulation, as the famous S.O. reply suggests, please use a HTML parser; such as the Beautiful Soup library instead :D!

Community
  • 1
  • 1
ospahiu
  • 3,465
  • 2
  • 13
  • 24
1

It seems like you want ABC and DEF , so you need write re like this (.*?)

import re
target = ' <td class="cell-1"><div><span class="value-frame">&nbsp;~ABC^,~DEF^</span></div></td>'
matchObj = re.findall(r'~(.*?)\^', target)
print matchObj 
# ['ABC', 'DEF']

you can learn more about re module

playSCforever
  • 266
  • 1
  • 9