-1

I am trying to automate loading 12 pickle files that have similar names using a for loop.

I have AirBnB data for 3 different cities (Jersey city, New York city and Rio), each city have 4 types of files (listings, calendar, locale, and reviews); I have 12 files in total, the names of the file are very similar (city_fileType.pkl).

  jc_listings.pkl, jc_calendar.pkl, jc_locale.pkl, jc_reviews.pkl  # Jersey city dataset
  nyc_listings.pkl, nyc_calendar.pkl , nyc_locale.pkl, nyc_reviews # New York City dataset
  rio_listings.pkl, rio_calendar.pkl, rio_locale.pkl, rio_reviews.pkl # Rio city dataset

I am trying to automate the loading of these files.

When I run the code:

path_data = '../Data/' # local path

jc_listings = pd.read_pickle(path_data+'jc_listings.pkl')

jc_listings.info()

This works fine.

But when I try to automate it does work properly. I am trying:

# load data
path_data = '../Data/'

#list of all data names
city_data = ['jc_listings','jc_calendar','jc_locale','jc_reviews',
             'nyc_listings','nyc_calendar','nyc_locale','nyc_reviews',
             'rio_listings','rio_calendar','rio_locale','rio_reviews']

# loop to load all the data with respective name
for city in city_data:
    data_name = city
    print(data_name) # just to inspect and troubleshoot
    city = pd.read_pickle(path_data+data_name+'.pkl')
    print(type(city)) # just to inspect and troubleshoot

This runs without errors and the printouts looks fine. However when I try

rio_reviews.info()

I get the following error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In [37], line 3
      1 # inspecting the data
----> 3 rio_reviews.info()

NameError: name 'rio_reviews' is not defined
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • 1
    `city = pd.read_pickle(...)` This does not create a new variable that is named with the _value_ of city. It just creates a variable named `city`. – John Gordon Dec 14 '22 at 02:04
  • What is `rio_reviews`? It's never mentioned anywhere else in your code. You can load from a name you've never assigned to... – ShadowRanger Dec 14 '22 at 02:10
  • @JohnGordon, yes it seems I can't create variable the way I was thinking. Thanks – Marcio Bernardo Dec 14 '22 at 03:03
  • @ShadowRanger ```rio_reviews``` is part of the iterator ```city_data```; but as John and Pranav pointed, I can't create variable this way. – Marcio Bernardo Dec 14 '22 at 03:05

2 Answers2

1

I would suggest you another approach:

import pandas as pd
from pathlib import Path

data = Path('../Data')

cities = ['jc', 'nyc', 'rio']
files = ['listings', 'calendar', 'locale', 'reviews']
dfs = {}

for city in cities:
    for file in files:
        dfs[city][file] = pd.read_pickle(data / f'{city}_{file}.pkl')

That will give a dictionary dfs, from which you can access each city data with something like this:

dfs['jc']['listings'].info()
dfs['rio']['reviews'].info()

... for example.

We can further simplify the code using itertools.product:

import pandas as pd
from pathlib import Path
from itertools import product

data = Path('../Data')

cities = ['jc', 'nyc', 'rio']
files = ['listings', 'calendar', 'locale', 'reviews']
dfs = {}

for city, file in product(cities, files):
    dfs[city][file] = pd.read_pickle(data / f'{city}_{file}.pkl')

accdias
  • 5,160
  • 3
  • 19
  • 31
  • ```for city in cities: for file in files: dfs[city][file] = pd.read_pickle(data / f'{city}_{file}.pkl') ``` Didn't work; I created a single list instead of doing ```[city][file]```; but having all the Data Frames inside of a dictionary raised other issues. When I tried to change values ```jc_cal['date'] = pd.to_datetime(jc_cal['date']);``` it raised ```A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead```; even doing ```jc_cal.loc[:,'date'] = pd.to_datetime(jc_cal.loc[:,'date']);``` – Marcio Bernardo Dec 14 '22 at 15:04
  • Update your question with the code you are trying now and the full error message you are getting. It would also be good to know what Python version you are using. – accdias Dec 14 '22 at 15:28
  • The solution was to run: ```for key, val in dfs.items() : exec(key + '=val')``` and work with 12 data frames. – Marcio Bernardo Dec 14 '22 at 16:03
  • Creating variables using `exec()` or `eval()` [**is a BAD practice**](https://stackoverflow.com/questions/1933451/why-should-exec-and-eval-be-avoided). You really should try to use a dictionary or a list instead. – accdias Dec 14 '22 at 19:01
-2

It looks like you have stored all the data in city and have not defined the "rio_reviews" variable thats why you are getting this error