How to get the folder name from path string and add it to a new column in pandas dataframe?

Question

I want to read folders' names from tar.gz file, and create column that contains the names.

I'm using this code:

file_path = r"C:\Users\filename.tar.gz"
start_with = './mainfolder/'

import tarfile
import re
with tarfile.open(file_path, "r:*") as tar:
    csv_path = tar.getnames()
    csv_path = list(n for n in tar.getnames() if (n.endswith('.csv')) & (n.startswith(start_with)))
    df = pd.DataFrame()

    csv_list = []

    for file in csv_path:
        df_temp = pd.read_csv(tar.extractfile(file))
        csv_list.append(df_temp)

    df = pd.concat(csv_list)

In the main folder there are few folders that have names. After reading a csv file from folder "X" (for example), "FolderName" column should be created in this csv file and has to contain the name of the folder ("X") for all the rows. And so for every csv file.

An exmaple for path string: ./mainfolder/1001_name or ./mainfolder/1002_some_name

It's just reading the csv files and concatenating them. But I want to add the folder name (which the file comes from) as a column. — qwerty, Nov 03 '19 at 19:16

Rithin Chalumuri · Accepted Answer · 2019-11-03T19:36:30.583

1

After the following line:

df_temp = pd.read_csv(tar.extractfile(file))

You can get the folder name from file path string using os.path.dirname() method. More info here.

You'll need to import os module.

Example:

#returns ./mainfolder/1001_name
full_folder_path = os.path.dirname(file)

#returns 1001_name
folder = os.path.basename(full_folder_path)

#returns name bit
result = folder[folder.index('_')+1:]

df_temp['FolderName'] = result

This create a new column called FolderName and set the value for all rows. More info here.

edited Nov 03 '19 at 19:36

answered Nov 03 '19 at 19:16

Rithin Chalumuri

1,739
7
19

@qwerty, you can split the value by `_` and keep the last bit :) – Rithin Chalumuri Nov 03 '19 at 19:27
1

@qwerty, just posted a possible answer for that. https://stackoverflow.com/questions/58673386/aggregate-by-id-and-date-and-get-maximum-and-mode-values-for-specified-columns/58684885#58684885 – Rithin Chalumuri Nov 03 '19 at 21:59

How to get the folder name from path string and add it to a new column in pandas dataframe?

1 Answers1