I am somewhat of a beginner programmer and am looking for help and an explanation of a problem. I am looking to extract the ID numbers from a string into new column, then fill in missing numbers.
I am working with pandas dataframe and I have the following set of street names, some with an ID number and others missing:
*Start station*:
"19th & L St (31224)"
"14th & R St NW (31202)"
"Paul Rd & Pl NW (31602)"
"14th & R St NW"
"19th & L St"
"Paul Rd & Pl NW"
My desired outcome:
*Start station* *StartStatNum*
"14th & R St NW" 31202
"19th & L St" 31224
"Paul Rd & Pl NW" 31602
"14th & R St NW" 31202
"19th & L St" 31224
"Paul Rd & Pl NW" 31602
I am having difficulty after my first step of splitting. I can split based on position with the following:
def Stat_Num(Stat_Num):
return Stat_Num.split('(')[-1].split(')')[0].strip()
db["StartStatNum"] = pd.DataFrame({'Num':db['Start station'].apply(Stat_Num)})
But this gives:
*Start station* *StartStatNum*
"19th & L St (31224)" 31202
"14th & R St NW (31202)" 31224
"Paul Rd & Pl NW (31602)" 31602
"14th & R St NW" "14th & R St NW"
"19th & L St" "19th & L St"
"Paul Rd & Pl NW" "Paul Rd & Pl NW"
The problem would then arise when I want to find/fill StartStatNum with the station ID numbers that I don't have.
I have been trying to get to know str.extract, str.contains, re.findall
and tried the following as a possible stepping stone:
db['Start_S2'] = db['Start_Stat_Num'].str.extract(" ((\d+))")
db['Start_S2'] = db['Start station'].str.contains(" ((\d+))")
db['Start_S2'] = db['Start station'].re.findall(" ((\d+))")
I have also tried the following this from here
def parseIntegers(mixedList):
return [x for x in db['Start station'] if (isinstance(x, int) or isinstance(x, long)) and not isinstance(x, bool)]
However when I pass values in, I get a list 'x' with 1 value. As a bit of a noob, I don't think going the pattern route is best as it will also take in unwanted integers (although I could possibly turn to Nan's as they would be less than 30000 (the lowest value for ID number) I also have an idea that it could be something simple that I'm overlooking, but after about 20 straight hours and a lot of searching, I am at a bit of a loss.
Any help would be extremely helpful.