0

Let's say as an example I have a folder Class with school classes and subfolders based on year and month. In the subfolders are tables with grades for each student of the class and the subjects. The grades table is in a daily format. So we have grades of all the students from one class for one day. The folder structure looks kind of like this:

.../Class/7a/2021/01/Grades_01012021
.../Class/7a/2021/01/Grades_02012021
.../Class/7a/2021/01/Grades_03012021
...
.../Class/7b/2021/01/Grades_01012021
.../Class/7b/2021/01/Grades_02012021
.../Class/7b/2021/01/Grades_03012021
...
.../Class/9/2022/02/Grades_01022022
.../Class/9/2022/02/Grades_02022022
.../Class/9/2022/02/Grades_03022022
...

The table Grades_01012021 for example contains subjects like Math, Physics, Chemistry as columns and the student names as rows with a date column as a timestamp like this:

Date Student Math Physics Chemistry ...
01-01-2021 John A- C B- ...
01-01-2021 Julian C A B+ ...
01-01-2021 Anna B A+ A- ...
01-01-2021 ... ... ... ... ...

I would like to combine the grade tables grouped by month and then by class. So I have a table with Grades from class 7a for January 2021, February 2021, March 2021 and so on. For 7b also a Grade table for january 2021, february 2021 and so on. In other words combining all the tables in 01, 02, 03 and so on based on the class.

Secondly, I would also like to have a combined table for all the tables from one class. Like a table for class 7a including 2021 and 2022.

I have tried so far to list all the files from the filepath .../Class/ and filter for "Grades_" but it combines all the grades tables disregarding the class and it does not group by month either. Any idea how I can combine the tables firstly by month for each class and by class overall?

Thanks and best regards!

  • Check out the answer to this question: https://stackoverflow.com/questions/31782763/how-to-use-regex-to-include-exclude-some-input-files-in-sc-textfile – ARCrow Jul 09 '22 at 18:40

1 Answers1

0

From what I understand, your CSV files do not contain the school class column. Hence, we can extract that from each file name.

Your code should look something like this

import os
import glob
import re

import pandas as pd

def read_file(path, **kwargs):
    class_ = re.search('Class/(?P<class>\d+[a-z]*/)', path).groupdict()['class']

    df = pd.read_csv(path, low_memory=False, **kwargs)
    df['Class'] = class_

    return df

# Assuming this file is in the `Class` folder
df = pd.concat([read_file(path, parse_dates=[0]) for path in glob.glob('**/*.csv'], ignore_index=True)

After this point, I am unsure what you mean by 'grouping the tables by month'. Do you perhaps want to take the average of the grades for each student? In which case you'll have to map the existing letter grades to numbers. After that, you can look into the pandas.resample method followed by the pandas.groupby method where you will pass in `['Class', 'Month'] as the first parameter.

As for the second part of the question where you wanted one table each class, you can change the glob expression. For example, to get all the data of class '7a' you can pass in the expression 'Class/7a/*.csv'.

Hope this helps.