0

I'm working with a Python script that takes some CSV files inside a folder and merges the data inside these files, but the problem is when sorting the files.

I found a similar useful question and I try to use the answers for it, but they didn't work.

The reality is I can obtain the final file, but the sort method doesn't work as I expect. I'm using the numeric element in the name of each file that I want to sort, I also include an image from my console:

How can I resolve this issue?

my code is the following:

import pandas as pd
import os
import glob
import numpy as np
import re
from os import listdir


#files = glob.glob1('./separa_0-60/', '*' + '.csv')
# if you want sort files according to the digits included in the filename, you can do as following:
#data_files = sorted(files, key=lambda x:float(re.findall("(\d+)",x)[0]))

#data_files = sorted(glob.glob('./separa_0-60/resultados_nodos_*.csv'))
data_files = sorted(glob.glob('./separa_0-60/resultados_nodos_*.csv'), key=lambda x: float(re.findall("(\d+)",x)[0]))
#print(files)
print(data_files)

mergeddata = pd.concat(pd.read_csv(datafile, sep=';')
             for datafile in data_files)

keep_col = [
    "node_code",
    "throughput[Mbps]",
    "node_code.1",
    "throughput[Mbps].1"
]

mergeddata2 = mergeddata[keep_col]

print(mergeddata2)
mergeddata2.to_csv('resul_nodos_final_separa0-60.csv', index=False)

I very much appreciate all the help, regards!

accdias
  • 5,160
  • 3
  • 19
  • 31

1 Answers1

1

The problem is that the directory name "separa_0-60" has digits in it. The first result from your findall is that "0". Better to do a more specific search on the file name.

data_files = sorted(glob.glob('./separa_0-60/resultados_nodos_*.csv'),
    key=lambda x: float(re.search(r"resultados_nodos_(\d+).csv$", x).group(1)))
tdelaney
  • 73,364
  • 6
  • 83
  • 116