2

The task is to wrap URLs in excel file with html tag. For this, I have a fucntion and the following code that works for one column named ANSWER:

import pandas as pd
import numpy as np
import string
import re

def hyperlinksWrapper(myString):
    #finding all substrings that look like a URL

    URLs = re.findall("(?P<url>https?://[^','')'' ''<'';'\s\n]+)", myString)
    #print(URLs)
    
    #replacing each URL by a link wrapped into <a> html-tags
    for link in URLs:
        wrappedLink = '<a href="' + link + '">' + link + '</a>'
        myString = myString.replace(link, wrappedLink)
    
    return(myString)
#Opening the original XLS file
filename = "Excel.xlsx"
df = pd.read_excel(filename)

#Filling all the empty cells in the ANSWER cell with the value "n/a"
df.ANSWER.replace(np.NaN, "n/a", inplace=True)

#Going through the ANSWER column and applying hyperlinksWrapper to each cell
for i in range(len(df.ANSWER)):
    df.ANSWER[i] = hyperlinksWrapper(df.ANSWER[i])

#Export to CSV
df.to_excel('Excel_refined.xlsx')

The question is, how do I look not in one column, but in all the columns (each cell) in the dataframe without specifying the exact column names?

Ihor V.
  • 21
  • 1
  • 1
    To loop through columns you can use: `for col in df.columns`. To loop through the whole DataFrame row by row (categorically inefficient, and not recommended) you can use: `for idx, row in df.iteritems()`. The **correct answer** is: research the `df.apply()` function. – S3DEV Aug 03 '20 at 17:39
  • Welcome! Pandas Dataframes have a method `.replace()` [read the docs](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html). You can pass your regex pattern directly to it and it will replace every instance in every row of every column (i.e. all occurrences in df) with the parameter `value`. Remember to pass `regex=True`. – RichieV Aug 03 '20 at 17:46
  • A couple more suggestions. You can replace `NaN`'s with `.fillna()`. Also, whenever you feel the need to iterate over rows in dfs, read this [Q&A](https://stackoverflow.com/q/16476924/6692898) about vectorization vs looping in pandas. Most likely there is an optimized pandas function for what you want. – RichieV Aug 03 '20 at 17:50

1 Answers1

0

Perhaps you're looking for something like this:

import pandas as pd
import numpy as np
import string
import re

def hyperlinksWrapper(myString):
    #finding all substrings that look like a URL

    URLs = re.findall("(?P<url>https?://[^','')'' ''<'';'\s\n]+)", myString)
    #print(URLs)
    
    #replacing each URL by a link wrapped into <a> html-tags
    for link in URLs:
        wrappedLink = '<a href="' + link + '">' + link + '</a>'
        myString = myString.replace(link, wrappedLink)
    
    return(myString)

# dummy dataframe
df = pd.DataFrame(
    {'answer_col1': ['https://example.com', 'https://example.org', np.nan], 
     'answer_col2': ['https://example.net', 'Hello', 'World']}
)

# as suggested in the comments (replaces all NaNs in df)
df.fillna("n/a", inplace=True)

# option 1
# loops over every column of df
for col in df.columns:
    # applies hyperlinksWrapper to every row in col
    df[col] = df[col].apply(hyperlinksWrapper)
    
# [UPDATED] option 2
# applies hyperlinksWrapper to every element of df
df = df.applymap(hyperlinksWrapper) 

df.head()
akensert
  • 284
  • 1
  • 2
  • 7
  • Thank you, @akensert, but both your options return the following error: File "C:\Users\ihor.vinnyk\AppData\Local\Programs\Python\Python38\lib\re.py", line 241, in findall return _compile(pattern, flags).findall(string) TypeError: expected string or bytes-like object – Ihor V. Aug 04 '20 at 08:19
  • Hmm, I'm using Python3.6, and it works fine for me. Does the dummy example/dataframe work for you? Perhaps there's a cell in your dataframe that is not a 'string or bytes-like object'. – akensert Aug 04 '20 at 09:36