I have the following formula which creates a dataframe from a text file:
import re
dataframe = pd.DataFrame(columns=('State','RegionName'))
with open('/Users/name/Desktop/university_towns.txt',"r") as f_in:
lines = f_in.readline()
i = 0
for line in lines:
if '[edit]' in line:
states = re.search(r'^([^(\[]+)', line)
else:
countries = re.search(r'^([^(\[]+)', line)
dataframe.loc[i] = [states,countries]
i += 1
This gives me the output for 'dataframe' as:
dataframe
State RegionName
0 <re.Match object; span=(0, 7), match='Wyoming'> <re.Match object; span=(0, 1), match='A'>
1 <re.Match object; span=(0, 7), match='Wyoming'> <re.Match object; span=(0, 1), match='l'>
2 <re.Match object; span=(0, 7), match='Wyoming'> <re.Match object; span=(0, 1), match='a'>
3 <re.Match object; span=(0, 7), match='Wyoming'> <re.Match object; span=(0, 1), match='b'>
4 <re.Match object; span=(0, 7), match='Wyoming'> <re.Match object; span=(0, 1), match='a'>
5 <re.Match object; span=(0, 7), match='Wyoming'> <re.Match object; span=(0, 1), match='m'>
6 <re.Match object; span=(0, 7), match='Wyoming'> <re.Match object; span=(0, 1), match='a'>
7 <re.Match object; span=(0, 7), match='Wyoming'> None
8 <re.Match object; span=(0, 7), match='Wyoming'> <re.Match object; span=(0, 1), match='e'>
9 <re.Match object; span=(0, 7), match='Wyoming'> <re.Match object; span=(0, 1), match='d'>
10 <re.Match object; span=(0, 7), match='Wyoming'> <re.Match object; span=(0, 1), match='i'>
11 <re.Match object; span=(0, 7), match='Wyoming'> <re.Match object; span=(0, 1), match='t'>
12 <re.Match object; span=(0, 7), match='Wyoming'> <re.Match object; span=(0, 1), match=']'>
13 <re.Match object; span=(0, 7), match='Wyoming'> <re.Match object; span=(0, 1), match='\n'>
However, I would like this in string format and not in Regex format.
Thus, it should look something like this:
State RegionName
0 Alabama Auburn
1 Alabama Florence
2 Alabama Jacksonville
3 Alabama Livingston
4 Alabama Montevallo
5 Alabama Troy
6 Alabama Tuscaloosa
Would anybody be able to give me a helping hand?
The raw txt file can be found here: