1

I’ve got a dictionary structured like this:
dict={‘series_1’:[‘id_series’,[[‘season_1’,[[‘ep1_title’,’ep_url’],[‘ep2_title’,’ep_url’]…],[‘season_2’,[[‘ep1_title’,’ep_url’],[‘ep2_title’,’ep_url’]…],…]],‘series_2’:[‘id_series’,[[‘season_1’,[[‘ep1_title’,’ep_url’],[‘ep2_title’,’ep_url’]…],[‘season_2’,[[‘ep1_title’,’ep_url’],[‘ep2_title’,’ep_url’]…],…]],…}

Here's a sample:

{'Scooby-Doo! Mystery Incorporated': ['1660055', [['season 1', [['Pawn of Shadows', 'https://dl.opensubtitles.org/it/download/sub/4797725'], ['All Fear the Freak', 'https://dl.opensubtitles.org/it/download/sub/4797755']]], ['season 2', [['Through the Curtain', 'https://dl.opensubtitles.org/it/download/sub/5465599'], ['Come Undone', 'https://dl.opensubtitles.org/it/download/sub/5465681']]]]], 'Scooby e Scrappy Doo': ['0084970', [['season 1', [["Scooby Roo/Scooby's Gold Medal Gambit", 'https://dl.opensubtitles.org/it/download/sub/6086643'], ['The Mark of Scooby/The Crazy Carnival Caper', 'https://dl.opensubtitles.org/it/download/sub/6086649']]]]]}

and i want to store this data in a Pandas dataframe built like this:

series_title    id_series   #season   ep_title      ep_url
series_1        #           1         title_1       #
series_1        #           1         title_2       #
series_1        #           2         title_1       #
series_2        #           1         title_1       #
series_2        #           2         title_1       #
series_2        #           2         title_2       #

etc.

I tried to apply solutions found in other questions (like this Construct pandas DataFrame from items in nested dictionary) but they are too different and I didn’t manage to reach my goal. Can anybody help me? Thanks

1 Answers1

2

The season_id as the first element of a list whose second element is nested is going to make the simple automatic loading approaches difficult in this case. I would recommend just opening up the complex dict and creating a record list.

records = []
for series_name, seasons in d.items():
    series_id = seasons[0]
    for season_name, season_url, episode_list in seasons[1]:
        for episode_name, episode_url in episode_list:
            records.append([series_name, series_id, season_name, season_url, episode_name, episode_url])
df = pd.DataFrame.from_records(records, columns=["series_title", "series_id", "season_number", "season_url", "ep_title", "ep_url"])

To make the exact format, without "season_url" and with "season_number" as an int:

records = []
for series_name, seasons in d.items():
    series_id = seasons[0]
    for season_name, season_url, episode_list in seasons[1]:
        season_number = int(season_name.strip()[-1])
        for episode_name, episode_url in episode_list:
            records.append([series_name, series_id, season_number, episode_name, episode_url])
df = pd.DataFrame.from_records(
    records, columns=["series_title", "id_series", "season", "ep_title", "ep_url"]
)
mmdanziger
  • 4,466
  • 2
  • 31
  • 47