I want to read folders' names from tar.gz file, and create column that contains the names.
I'm using this code:
file_path = r"C:\Users\filename.tar.gz"
start_with = './mainfolder/'
import tarfile
import re
with tarfile.open(file_path, "r:*") as tar:
csv_path = tar.getnames()
csv_path = list(n for n in tar.getnames() if (n.endswith('.csv')) & (n.startswith(start_with)))
df = pd.DataFrame()
csv_list = []
for file in csv_path:
df_temp = pd.read_csv(tar.extractfile(file))
csv_list.append(df_temp)
df = pd.concat(csv_list)
In the main folder there are few folders that have names. After reading a csv file from folder "X" (for example), "FolderName" column should be created in this csv file and has to contain the name of the folder ("X") for all the rows. And so for every csv file.
An exmaple for path string: ./mainfolder/1001_name
or ./mainfolder/1002_some_name