0

I am working on a webscrape script. The pandas data frame that is generated is fantastic however I need to add a unique_id column with the value of a href URL contained in the HTML.

<td><a href="/admin/tasks/edit/82689"> ADDRESS </a> CLIENT </td>

Currently the Pandas data frame has a column containing 'ADDRESS CLIENT' but how can I add a seperate column containing the href URL?

I am currently able to get a comma separated list of the unique_id values using the following:

unique_id = [a['href'] for a in table.select('a[href]')]

any direction would be much appreciated!

Josh
  • 3
  • 1

1 Answers1

0

The correct way to do it will be using the pandas.DataFrame.assign method

df.assign(url=unique_id)

This will give you a new column in the DataFrame with name url and values the values from the numpy array. As far as I know column assignment like df['url] = unique_id is deprecated. You can read more here.

b.vasilev
  • 16
  • 1