0

I'm trying to merge my 119 csv files into one file through a python code. The only issue I'm facing is that even though I've applied the sort method it isnt working and my files are not ordered , which is causing the date column to be un-ordered. Below is the code, when I run this and open my new csv file "call.sms.merged" it appears that after my 1st csv file, data is inserted or merged from the 10th csv then 100th csv till 109 csv & then it starts to begin from csv 11. I'm attaching an image for better understanding.

file_path = "C:\\Users\\PROJECT\\Data Set\\SMS Data\\"
file_list = [file_path + f for f in os.listdir(file_path) if f.startswith('call. sms ')]
csv_list = []
for file in sorted(file_list):
    csv_list.append(pd.read_csv(file).assign(File_Name = os.path.basename(file)))
csv_merged = pd.concat(csv_list, ignore_index=True)
csv_merged.to_csv(file_path + 'calls.sms.merged.csv', index=False)

UN-SORTED DATA

Incorrect order of csv

un-ordered

Python Code and Error : Python Code Screenshot

Error Screenshot

1 Answers1

0

You can extract the number of each call/file with pandas.Series.str.extract then use pandas.DataFrame.sort_values to make an ascending sort along this column/number.

Try this :

file_path = "C:\\Users\\PROJECT\\Data Set\\SMS Data\\"
file_list = [file_path + f for f in os.listdir(file_path) if f.startswith('call. sms ')]
csv_list = []

for file in file_list:
    csv_list.append(pd.read_csv(file).assign(File_Name = os.path.basename(file)))
    
csv_merged = (
                pd.concat(csv_list, ignore_index=True)
                    .assign(num_call= lambda x: x["File_Name"].str.extract("(\d{1,})", expand=False).astype(int))
                    .sort_values(by="num_call", ignore_index=True)
                    .drop(columns= "num_call")
             )

csv_merged.to_csv(file_path + 'calls.sms.merged.csv', index=False)
Timeless
  • 22,580
  • 4
  • 12
  • 30
  • Thank you for your response. I've tried this and I'm getting a ValueError: No objects to concatenate and ValueError: Indexes have overlapping values: ['a'] – Maryam Faheem Dec 05 '22 at 07:47
  • That's weird because I did not change anything in your code before `pandas.concat`. Can you share a screenshot of the full message error and the code used ? – Timeless Dec 05 '22 at 07:49
  • yes sure, I'll attach an image – Maryam Faheem Dec 05 '22 at 07:52
  • Can you insert a cell before the last one and `print(len(csv_list))` ? We need to make sure that the list of dataframes are not empty. – Timeless Dec 05 '22 at 08:00
  • I did and its empty I'm getting a 0. – Maryam Faheem Dec 05 '22 at 08:01
  • Try to re-run each cell from the start and make sure that the list of dataframes is not empty before running the last cell. By the way, I made a small edit on my answer, check it out. – Timeless Dec 05 '22 at 08:03
  • So, I've made sure the dataframe isnt empty and I'm getting 116 as its length but when I run your pd.concat code its giving a ValueError: Pattern contains no capture groups – Maryam Faheem Dec 05 '22 at 08:27
  • It seems that you did not consider my update. I made earlier a small edit on the `extract` function/line. – Timeless Dec 05 '22 at 08:28
  • No worrires, glad it helped. – Timeless Dec 05 '22 at 08:34