Modifying Multiple csv file that has been stored in Array

Question

I managed to store 500 csv's file into array, each csv has columns:

time_s|hbaro_m|hdot_1_mps|hralt_m|lon_rad|lat_rad|tas_mps|gs_mps|wow|chi_rad|lap_te_pos|
      |       |          |       |       |       |       |      |           |          |

where each column has about 5k to 10k rows of data.

Using this code

# import necessary libraries
import pandas as pd
import os
import glob
   
# use glob to get all the csv files 
# in the folder
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))

# loop over the list of csv files
df =  [0 for i in range(500)] 
for i in range (500):
    # read the csv file
    df[i]= pd.read_csv(csv_files[i])

I get the output an array in which each array has the array of each csv (I don't know how to explain this actually)

Like

df[1] = csv1,
df[2] = csv2

etc

I want to modify and swap the columns (lon_rad, lat_rad) later, is this possible and how?

Thanks in advance

Do you want to save results into csv later, or make modification only in array? In last case look at https://stackoverflow.com/questions/13148429/how-to-change-the-order-of-dataframe-columns , you can make something like: `df[i] = df[i][['col4', 'col3']]`. — Stanislav Ivanov, May 17 '21 at 15:34

Chris Wesseling · Answer 1 · 2021-05-17T17:06:43.807

To be specific, your output df is a list of pd.DataFrame

And the code breaks if there are less then 500 csv files in the directory.

import pandas as pd
from pathlib import Path


def swap_columns(df col1='lon_rad', col2='lat_rad'):
    'Return a DataFrame with col1 and col2 swapped'
    columns = df.columns.to_list()
    try:
        col1_idx = columns.index(col1)
        col2_idx = columns.index(col2)
        # swap the column names
        columns[col1_idx], columns[col2_idx] = columns[col2_idx], columns[col1_idx]
    except ValueError:
        pass  # one of the columns is not found, maybe warn?
    return df[columns]


csv_files = Path.cwd().glob('*.csv')
# read all csv files
dfs = [pd.read_csv(path) for path in csv_files]
# or read at most 500 csv files if you really want
# dfs = [pd.read_csv(path) for _, path in zip(range(500), csv_files)
dfs = [swap_columns(df) for df in dfs]

I changed your variable name to dfs to show it's a list of multiple not just one DataFrame.

NB this reads all csv files into memory and then swaps the columns. If all you want to do is swap the columns in the csv files, I would use a generator instead of a list, so you can manipulate each file, one at a time. That way this csv data cleaning step isn't that memory hungry and can be done on a small machine:

# a generator of tuples (Path, DataFrame)
dfs = ((path, pd.read_csv(path)) for path in csv_files)
dfs = ((path, swap_columns(df)) for path, df in dfs)
for path, df in dfs:
    df.to_csv(path)

Your other manipulations can be written as a function like swap_columns and get mapped over the list in the same way.

PS this code adds the created index column to the written csv.. Consider using the index_col parameter of read_csv

Modifying Multiple csv file that has been stored in Array

1 Answers1