I am attempting to return user location data while scrapping twitter. I am having trouble with the regex, specifically, I wish to exclude "\n" from the output.
Current regex:
data = open("user_locations.txt", "r")
valid_ex = re.compile(r'([A-Z][a-z]+), ([A-Za-z]+[^\n])')
user_locations.txt:
California, USA
You are your own ExclusiveLogo
Around The World
Galatasaray
★DM 4 PROMO / CONTENT REMOVAL★
Glasgow, Scotland
United States
Berlin, Germany
Global
Expected output:
['California, USA', 'Glasgow, Scotland', 'Berlin, Germany']
Actual output:
['California, USA\n', 'Glasgow, Scotland\n', 'Berlin, Germany\n']
An alternate reason for the discrepancy between expected vs actual output, may be the way in which I am using search() in printing the list. That is:
for line in data:
result = valid_ex.search(line)
if result:
locations_list.append(line)
print(locations_list)
Thank you, any help would be greatly appreciated! :)