1

I'm reading in a bunch of csv files into one large dataframe. The code below works but it gives a weird warning which I'm not sure if I should do anything about it.

import pandas as pd
import glob

# List of folders

folders = [
(202206, r"\\a..."),
(202207, r"\\a..."),
(202208, r"\\a..."),
(202209, r"\\a..."),
(202210, r"\\a..."),
(202211, r"\\a..."),
(202212, r"\\a..."),
(202301, r"\\a...")
]

columns = ['A','B','C','D']

topline = 3

# Loop through folders and append each mpf into a dataframe

list_df = []

for folder in folders:
    csv_files = glob.glob(folder[1])
    for file in csv_files:
        temp_df = pd.read_csv(file, header=topline, skip_blank_lines=True, usecols=columns)
        # tilde removes the Nans and junk at the bottom of the file
        df = temp_df[~temp_df[columns[0]].isna()]
        df['period'] = folder[0]
        list_df.append(df)
    
data = pd.concat(list_df, axis=0, ignore_index=True)

It gives the following warning:

<ipython-input-17-79f98aa2c09a>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['period'] = folder[0]

Can someone explain what this means and should I be concerned. I read the linked page and couldn't make sense of it or how it relates to what I'm doing.

Zain
  • 95
  • 6
  • 1
    This is just a warning because you are setting values on a copy of a slice of a DataFrame, because it can be a source of bug later in your program, I will make an answer – Saxtheowl Mar 18 '23 at 15:18
  • 2
    This is a very common question that everyone encounters sooner or later. This discussion has many good answers on how to avoid it: https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas – Vincent Rupp Mar 18 '23 at 15:22

1 Answers1

1

The warning you are seeing, is just a warning because you are setting values on a copy of a slice of a DataFrame, because it can be a source of bug later in your program

if you want to avoid this warning use .loc instead to set the value directly on the original dataframe, this way you can be sure that it set the value on the original dataframe instead of a copy of a slice

temp_df.loc[~temp_df[columns[0]].isna(), 'period'] = folder[0]
df = temp_df[~temp_df[columns[0]].isna()]
Saxtheowl
  • 4,136
  • 5
  • 23
  • 32