0

I have the following formula which creates a dataframe from a text file:

import re 
dataframe = pd.DataFrame(columns=('State','RegionName'))
with open('/Users/name/Desktop/university_towns.txt',"r") as f_in:
     lines = f_in.readline()
     i = 0 
     for line in lines: 
          if '[edit]' in line:
              states = re.search(r'^([^(\[]+)', line)
          else:
              countries = re.search(r'^([^(\[]+)', line)
              dataframe.loc[i] = [states,countries]
              i += 1 

This gives me the output for 'dataframe' as:

dataframe
                                              State                                  RegionName
0   <re.Match object; span=(0, 7), match='Wyoming'>   <re.Match object; span=(0, 1), match='A'>
1   <re.Match object; span=(0, 7), match='Wyoming'>   <re.Match object; span=(0, 1), match='l'>
2   <re.Match object; span=(0, 7), match='Wyoming'>   <re.Match object; span=(0, 1), match='a'>
3   <re.Match object; span=(0, 7), match='Wyoming'>   <re.Match object; span=(0, 1), match='b'>
4   <re.Match object; span=(0, 7), match='Wyoming'>   <re.Match object; span=(0, 1), match='a'>
5   <re.Match object; span=(0, 7), match='Wyoming'>   <re.Match object; span=(0, 1), match='m'>
6   <re.Match object; span=(0, 7), match='Wyoming'>   <re.Match object; span=(0, 1), match='a'>
7   <re.Match object; span=(0, 7), match='Wyoming'>                                        None
8   <re.Match object; span=(0, 7), match='Wyoming'>   <re.Match object; span=(0, 1), match='e'>
9   <re.Match object; span=(0, 7), match='Wyoming'>   <re.Match object; span=(0, 1), match='d'>
10  <re.Match object; span=(0, 7), match='Wyoming'>   <re.Match object; span=(0, 1), match='i'>
11  <re.Match object; span=(0, 7), match='Wyoming'>   <re.Match object; span=(0, 1), match='t'>
12  <re.Match object; span=(0, 7), match='Wyoming'>   <re.Match object; span=(0, 1), match=']'>
13  <re.Match object; span=(0, 7), match='Wyoming'>  <re.Match object; span=(0, 1), match='\n'>

However, I would like this in string format and not in Regex format.

Thus, it should look something like this:

State   RegionName
0   Alabama Auburn
1   Alabama Florence
2   Alabama Jacksonville
3   Alabama Livingston
4   Alabama Montevallo
5   Alabama Troy
6   Alabama Tuscaloosa

Would anybody be able to give me a helping hand?

The raw txt file can be found here:

https://raw.githubusercontent.com/irJERAD/Intro-to-Data-Science-in-Python/master/MyNotebooks/university_towns.txt

Caledonian26
  • 727
  • 1
  • 10
  • 27

0 Answers0