How to save a dataframe which changes in a loop

Question

I have some data sets in a folder. I want to merge all the datasets into one dataset. Before merging all datasets, I want to rename a column in all datasets where all datasets will have a common column called "time" for merging. In each run, df_original keeps a dataset and after changing the column name, I want to store the dataset somewhere before running the next i in loop as df_original changes and reads the next dataset in the folder. The question is, how can I save the changed datasets so that I have all the changed datasets after finishing the loop.

Code

import pandas as pd

complete_list = []
for i in range(len(files_to_merge)):
  df_original = pd.read_csv(os.path.join(folder, files_to_merge[i]))
  time_col = [j for j in df_original.columns if 'time' in j]
  df_original.rename(columns={time_col[0]: "time"}, inplace = True)
  complete_list.append(df_original)

To store a dataframe as CSV file there is the `to_csv` method. — Michael Butscher, Mar 22 '23 at 18:52
Can you elaborate on "save `df_original`". Since it is modified/renamed in-place already, the changes are saved to the `df_original`. Then where and why would you like to save this dataframe? — hc_dev, Mar 22 '23 at 18:54
I do not wanna to save it into .csv, I wanna to use it without importing the data with read_csv — Mahin GColab, Mar 22 '23 at 19:00
Maybe an example of given input (CSV-files) and expected (saved) output could help to show us the desired outcome. — hc_dev, Mar 22 '23 at 19:01
You already store the dataframes in memory in the `complete_list` for the later merging step. — Michael Butscher, Mar 22 '23 at 19:12
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Сергей Кох, Mar 23 '23 at 04:14

score 0 · Answer 1 · answered Mar 23 '23 at 20:24

The following adds what we asked but is still missing in your question:

give input as example
clarify the task attempted in human language
show actual output, even if unexpected/erroneous

Provide an example of input

Suppose you have given multiple dataframes in a list, like those 2:

df1 = pd.DataFrame({"A": [1, 2, 3], "start-time": [800, 1000, 900]})
df2 = pd.DataFrame({"B": [1, 2, 3], "salary": [1600, 1800, 1700]})

frames = [df1, df2]

Explain the task and expected behavior

Now you want to rename a column in all of them.

Rename-rule:

The column having a name containing time should be renamed to just time

def renamed_cols_inplace(df):
    cols_to_rename = [j for j in df.columns if 'time' in j]

    if len(cols_to_rename) = 0:
        print('ERROR: No matching column to rename. Found no column name containing 'time' in dataframe!')
        return None
    elif len(cols_to_rename) > 1:
        print('WARN: Only first column renamed. Found more than 1 column name containing 'time' in dataframe:', cols_to_rename)
    
    df.rename(columns={cols_to_rename[0]: "time"}, inplace = True)
    
    return cols_to_rename

Show actual output

When you apply this function on the input

# input is a list of dataframes
for df in frames:
    cols = renamed_cols_inplace(df)
    print('INFO: Count of columns renamed: ', len(cols) if cols else 0)

# output is the same list of dataframes
for df in frames:
    print(df)  # but the column-name changed

then you get following output:

INFO: Done, Renamed to "time".
INFO: Count of time-columns found/renamed:  1
ERROR: No matching column to rename. Found no column name containing "time" in dataframe!
INFO: Count of time-columns found/renamed:  0
   A  time
0  1   800
1  2  1000
2  3   900
   B    salary
0  4      1600
1  5      1800
2  3      1700

What you did not ask for specifically

Read the input

Read multiple dataframes from a list of CSV files, specified by names in a given folder.

file_names = ['file_A.csv', 'file_B.csv']
folder = 'my_folder'

files_to_merge = [os.path.join(folder, f) for f in file_names]
print('Files to merge:', files_to_merge)

frames = []
for f in files_to_merge:
    df = pd.read_csv(f)
    frames.append(df)

print('Read number of dataframes to merge:', len(frames))

Note: here we collect two sets with two different approaches:

the set of files to read, using the advanced concept of list-comprehension [os.path.join(folder, f) for f in file_names]
the set of dataframes read from files, using a simple for-loop with append

Merge or Append or Concat the output

Research for merge dataframes will find for example:

pandas three-way joining multiple dataframes on columns

or append dataframes or concat dataframes ... can also filter by tags [python] [pandas]