Reading and writing to an excel sheet using pandas in python, to use append or concat and what method?

Question

i'm writing a small script that reads from excel sheet the id of an episode and fills in it's corresponding series name, here's a following example of my excel sheet that would be used as input

my script would read the "tconst" value and use it to find the corrisponding episode on imdb and get the website title and use that to find the name of the series,

import pandas as pd
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

dataset_loc='C:\\Users\\Ghandy\\Documents\\Datasets\\Episodes with over 1k ratings 2020+Small.xlsx'
dataset= pd.read_excel(dataset_loc)

for tconst in dataset['tconst']:
    url='https://www.imdb.com/title/{}/'.format(tconst)
    soup = BeautifulSoup(urlopen(url),features="lxml")
    dataset = dataset.append({"Name": re.findall(r'"([^"]*)"',soup.title.get_text())[0]}, ignore_index=True)
    dataset.to_excel(dataset_loc,index=False)

I got a few problems with this code, first python keeps telling me to not use concat and instead use append, but all the answers on google and stackoverflow give examples with append and i don't know how to use concat exactly,

second, my data is being appened into a completely new and empty row, not next to the original data that i want, so in this example i would get "The Mandalorian" at row 4 instead of 2,

and finally third, i want to know if it's better to add the data one at a time or put them all in a temporary list variable and then add that all at the same time, and how would i go about doing that with concat?

score 1 · Accepted Answer · answered Apr 19 '22 at 11:14

1

I can't really say what your problem with append and concat consists in -- everyone says use append and you use append as well, do you want to use concat instead? Here is a post on the difference between concat and append.
Append appends rows, you might want to use .at?
I would say this depends on how much data you already have and how much you are going to add. To have less overhead and copying around I would prefer to add directly to the dataframe, but if there is a lot happening between the url call and the adding to the df, the collected version could be better.

answered Apr 19 '22 at 11:14

Stimmot

999
1
7
22

1

well i'm new to python so i don't really have a problem with append, it's python who does, every time i use it in the code i get the message that append will be removed in future versions and to use concat instead, .at is exactly what i wanted! thank you, somehow i never found that through googling and usually i'm pretty decent at that, anyways will update original post with answer for the future. – Ghandy kozman Apr 19 '22 at 11:46
Ah I see, yeah sometimes you have to search pretty specifically in order to find such methods, glad it helped. :) – Stimmot Apr 19 '22 at 14:22

score 0 · Answer 2 · answered Apr 19 '22 at 11:48

0

thanks to @Stimmot using .at, the code would look like this now:

for index, tconst in enumerate(dataset['tconst']):
    url='https://www.imdb.com/title/{}/'.format(tconst)
    soup = BeautifulSoup(urlopen(url),features="lxml")
    dataset.at[index,'Name']=re.findall(r'"([^"]*)"',soup.title.get_text())[0]
dataset.to_excel(dataset_loc)

answered Apr 19 '22 at 11:48

Ghandy kozman

15
4

Some formatting would really help this answer. – Friedrich Apr 19 '22 at 11:54

Reading and writing to an excel sheet using pandas in python, to use append or concat and what method?

2 Answers2