I have a data frame created by pandas. One of the columns in the data frame has URL's which, I would like to match and count the particular number of occurrences.
My logic is that if it does not return 'None' then at this stage print('Match'), however, that does not appear to work. Here is a sample of my current code, and would appreciate any tips on how to match a value using pandas as I really have just come back from using a lot of R and don't have a lot of experience with Pandas and data frames in python.
Title,URL,Date,Unique Pageviews
Preparing and Starting DS
career,http://www.datasciencecentral.com/forum/topic/show?
id=6448529:Topic:242750,20-Jan-15,163
The Rogue Data Scientist,http://www.datasciencecentral.com/forum/topic/show?
id=6448529:Topic:273425,4-May-15,1108
Is it safe to code after one bottle of
wine?,http://www.datasciencecentral.com/forum/topic/show?
id=6448529:Topic:349416,9-Nov-15,1736
Short-Term Forecasting of Electricity
Demand,http://www.datasciencecentral.com/forum/topic/show?
id=6448529:Topic:350421,12-Nov-15,1117
Visual directory of 339 tools.
Wow!,http://www.datasciencecentral.com/forum/topic/show?
id=6448529:Topic:373786,14-Jan-16,4228
8 Types of Data,http://www.datasciencecentral.com/forum/topic/show?
id=6448529:Topic:377008,23-Jan-16,2829
Very funny video for people who write
code,http://www.datasciencecentral.com/forum/topic/show?
id=6448529:Topic:379578,30-Jan-16,2444
Code Block (Pep8 Requires two line spaces between functions)
def count_set_words(as_pandas):
reg_exp = re.match('\b/forum', as_pandas['URL']).any()
if as_pandas['URL'].str.match(reg_exp, case=False, flags=0, na=np.NAN).any():
print("Match")
def set_new_columns(as_pandas):
titles_list = ['Year > 2014', 'Forum', 'Blog', 'Python', 'R',
'Machine_Learning', 'Data_Science', 'Data', 'Analytics']
for number, word in enumerate(titles_list):
as_pandas.insert(len(as_pandas.columns), titles_list[number], 0)
def open_as_dataframe(file_name_in):
reader = pd.read_csv(file_name_in, encoding='windows-1251')
return reader
def main():
multi_sets = open_as_dataframe('HDT_data5.txt')
set_new_columns
count_set_words(multi_sets)
main()