I have an input string:
"[u'$799,900', u'$1,698,000', u'$998,000', u'$1,299,000', u'$1,000,000', u'$499,950', u'$995,000', u'$998,000', u'$2,000,000', u'$988,000', u'$979,000', u'$1,285,000', u'$988,000', u'$579,000', u'$700,000', u'$1,100,000', u'$1,557,000', u'$999,888', u'$798,000', u'$998,000', u'$1,050,000', u'$888,000', u'$559,888', u'$774,900', u'$795,000', u'$850,000']","[u'3 bds ', u' 2 ba ', u' 1,361 sqft', u'4 bds ', u' 3 ba ', u' 2,845 sqft', u'3 bds ', u' 3 ba ', u' 1,534 sqft', u'3 bds ', u' 2 ba ', u' 1,762 sqft', u'5 bds ', u' 3 ba ', u' 2,398 sqft', u'2 bds ', u' 2 ba ', u' 956 sqft', u'4 bds ', u' 3 ba ', u' 1,840 sqft', u'3 bds ', u' 2 ba ', u' 1,212 sqft', u'3 bds ', u' 3 ba ', u' 1,878 sqft', u'3 bds ', u' 2 ba ', u' 1,240 sqft', u'3 bds ', u' 2 ba ', u' 1,207 sqft', u'3 bds ', u' 3 ba ', u' 1,905 sqft', u'3 bds ', u' 3.5 ba ', u' 1,591 sqft', u'2 bds ', u' 2 ba ', u' 946 sqft', u'2 bds ', u' 2 ba ', u' 1,067 sqft', u'4 bds ', u' 3 ba ', u' 2,254 sqft', u'5 bds ', u' 4 ba ', u' 2,744 sqft', u'3 bds ', u' 3 ba ', u' 1,291 sqft', u'4 bds ', u' 3 ba ', u' 1,480 sqft', u'3 bds ', u' 2 ba ', u' 1,513 sqft', u'4 bds ', u' 2 ba ', u' 1,846 sqft', u'9 bds ', u' 5 ba ', u' 3,336 sqft', u'2 bds ', u' 2 ba ', u' 983 sqft', u'4 bds ', u' 3 ba ', u' 1,476 sqft', u'3 bds ', u' 3 ba ', u' 1,872 sqft', u'2 bds ', u' 3 ba ', u' 1,459 sqft']"
From it, I need to extract the prices into a list of int
s.
This is what I have tried so far:
import re
pattern_price = r'\[u\'\$.*?\]'
patternx = r"(.*?u.*?)(\d+\,\d+\,\d+|\d+\,\d+)"
with open(fpath, "r") as f:
for line in f.readlines():
lst = re.findall(pattern_price, line)
print len(lst) # I get list with 1 element?
newlst = [x.split(patternx) for x in lst]
print len(newlst) # I got 1 element again?