-2

I have a csv of URLs in one column and a column of strings (words) associated with those URLs.

I want to write a function that goes through each URL and if "/2019/" is present in the URL, assign it to a new variable called "new_url" and if "/2018/" "/2017/" (etc) is present, assign it to a variable called "old_url"

I also want it to go through each of the words in the first column and if "2019" or no year at all is present, assign that to a new variable called "new_word"

example of columns:
hyundai sonata rebate | https://www.edmunds.com/hyundai/sonata/2018/deals

2017 jeep wrangler | https://www.edmunds.com/jeep/wrangler/2017/deals

2019 honda accord | https://www.edmunds.com/honda/accord/2019/deals

I've been trying to work with this https://gist.github.com/gruber/8891611 but am utterly confused and cant get it to work. Any ideas?!

em4019
  • 43
  • 1
  • 7

1 Answers1

1

Just something simple to get you started:

import re

sample_rows = [
    ("hyundai sonata rebate", "https://www.edmunds.com/hyundai/sonata/2018/deals"),
    ("2017 jeep wrangler", "https://www.edmunds.com/jeep/wrangler/2017/deals"),
    ("2019 honda accord", "https://www.edmunds.com/honda/accord/2019/deals"),
    ("1985 some old car", "https://www.edmunds.com/some/oldcar/1985/deals")
]

for row in sample_rows:
    keywords = row[0]
    url = row[1]
    # the url
    if "/2019/" in url:
        new_url = url
        print(f"new_url={new_url}")
    elif re.search("/(?:(?:20)|(?:19))\d{2}/", url):
        old_url = url
        print(f"old_url={old_url}")
    # the "words"
    if "2019" in keywords:
        new_word = keywords
        print(f"new_word={new_word}")
    elif re.search("(?:(?:20)|(?:19))\d{2}", keywords) is None:
        new_word = keywords
        print(f"new_word={new_word}")
Jetna
  • 21
  • 2
  • Is there a way to do this if instead of creating a list, it loops through rows in a dataframe? – em4019 Mar 18 '19 at 14:53
  • Yes, check this out: https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas – Jetna Mar 18 '19 at 14:55