3

I have a dataframe with 1,000s of URLs and company names that I need to convert to HTML links as well as do some formatting. I wrote a function that can go down the list and create the tags:

def linkcreate():
    if row['url'] == '####' or row['url'] == '#####':
        print('<span style="color: #293789!important; margin-bottom:0;">' + row['name'] + '</span>')
    else:
        print('<a href="' + row['url'] + '" target="_blank">' + row['name'] + '</a>')

The if statement is doing a bit of a clean up since there are a few dozen companies that do not have a url. Those are represented as '####' and '#####' in the df. For those, I am adding a span tag instead of a tag with some styling that will look like a link. else statement just constructs the link based on two columns in the df.

Another thing I wanted to do was put the half of the links in and the second half in . Below is my code with explanation:

# Calculates the middle point from the total count of rows in df
count = (int(data['url'].count()))/2
# Set counter to 0
counter = 0

for index, row in data.iterrows():
    counter = counter + 1
# Before the first <a> tag start the first section <div>
    if counter == 1:
        print('<div class="side-1">')
# If counter is less then or equals to the half point of rows in df build the links using the linkcreate()
    elif counter <= count:
        linkcreate()
# If counter is +1 from the half way point of rows add the closing </div> and start the second <div>
    elif counter == count + 1:
        print('</div>')
        print(' ')
        print('<div class="side-2">')
# If counter is greater then the half point of rows in df build the rest of the links using the linkcreate()
    elif counter > count:
        linkcreate()
# Closing </div> tag for the second set of links.
print('</div>')

This code works but is it the most efficient way to do this?

user3088202
  • 2,714
  • 5
  • 22
  • 36
  • 1
    why do you use print statement? – Ben.T Jun 13 '18 at 16:17
  • 1
    Did you have a look at [this](https://stackoverflow.com/a/20043785/4819376) answer? – rpanai Jun 13 '18 at 16:17
  • @Ben.T I am new to Python. I wanted to practice with a real live scenario. Thought print would be appropriate. What should I be using instead? – user3088202 Jun 13 '18 at 16:28
  • @user32185 Thanks. I will take a look. – user3088202 Jun 13 '18 at 16:28
  • @user3088202 ok, but at the end you want to create a file to use it later I assume or not especially? – Ben.T Jun 13 '18 at 16:29
  • @Ben.T For time being I am just doing a copy/paste from the console. My biggest concern is with the structure of the loop. I have a lot of elif statements in there. Breaking the links into 2 DIVS based on count is what created all the elif's and I just wanted to know is there is a more elegant way to do this. – user3088202 Jun 13 '18 at 16:32

1 Answers1

1

To be faster, you can first create a column with the links:

def linkcreate(row):
    if '####' in row['url']: # will catch both '####' and '#####'
        return '<span style="color: #293789!important; margin-bottom:0;">' + row['name'] + '</span>'
    else:
        return '<a href="' + row['url'] + '" target="_blank">' + row['name'] + '</a>'
df['link'] = df.apply(linkcreate,axis=1)

Then your print as you said it's not your concern:

print('<div class="side-1">')
print(df['link'][:count+1].to_string(header=None, index=False))
print('</div>')
print(' ')
print('<div class="side-2">')
print(df['link'][count+1:].to_string(header=None, index=False))
print('</div>')

you print without loop half of your column link

Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • This is great!. So if I see this correctly, you are adding another column to the df that applies the function that creates urls. And in the second part you are printing that column out. The only thing that is not clear to me is the [:count+1] / [count+1:]. Are you accessing the index and limiting the number of rows that are selected? [:count+1] = to 0 - half point and [count+1:] = half point to the end? How does it figure out where the half is? – user3088202 Jun 13 '18 at 17:02
  • @user3088202 indeed I add a column, with urls. for `[:count+1] / [count+1:]` you are right about from the beginning to half and from half to the end. but it's not really accesing the index (as index does not have to be integer) but it is a slice of quantity, see this [link](https://pandas.pydata.org/pandas-docs/stable/indexing.html) for more details on slicing. So for the half, I used your `count` which give the half size of your data and I slice up to the half – Ben.T Jun 13 '18 at 18:03