1

I have some code for joining excel files, how can I edit my code so that I don't have to type the full file name each time?

Thanks

Code below:

import pandas as pd

excel_names = ["Market_Information_System_Control_daily_trading_day_170701.xlsx",
               "Market_Information_System_Control_daily_trading_day_170702.xlsx",
               "Market_Information_System_Control_daily_trading_day_170703.xlsx",
               "Market_Information_System_Control_daily_trading_day_170704.xlsx",
               "Market_Information_System_Control_daily_trading_day_170731.xlsx"]

excels = [pd.ExcelFile(name) for name in excel_names]
frames = [x.parse(x.sheet_names[1], header=None,index_col=None) for x in excels]

frames[1:] = [df[1:] for df in frames[1:]]
combined = pd.concat(frames)
combined.to_excel("c.xlsx", header=False, index=False)

EDIT1: image

I just wan't the time to flow, not have heading breaks.

I think it has something to do with this: frames[1:] = [df[1:] for df in frames[1:]]

user8261831
  • 464
  • 1
  • 4
  • 20
  • Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. [Minimal, complete, verifiable example](http://stackoverflow.com/help/mcve) applies here. We cannot effectively help you until you post your MCVE code and accurately describe the problem. We should be able to paste your posted code into a text file and reproduce the problem you described. – Prune Jan 31 '19 at 00:15
  • I answered the question you asked. Your comments include a follow-up question about the results of later code -- a problem you didn't document. Please post one of these (long typing *or* frame concatenation) as a separate question, and supply the expected MCVE for the latter problem. – Prune Jan 31 '19 at 00:17
  • your answer didnt work unfortuantely – user8261831 Jan 31 '19 at 00:17
  • not sure if I get the idea wrong, but you are trying to join all excels into one excel? If it's in a folder, why not get every file in the folder so you do not have to type any part of the file name? – Tsuna Jan 31 '19 at 00:25
  • thats what I want to do, how do I do it? – user8261831 Jan 31 '19 at 00:27
  • @Prune could be error be due to this? `frames[1:] = [df[1:] for df in frames[5:]] ` – user8261831 Jan 31 '19 at 00:31
  • someone posted the answer before me, which is also a reference of https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory see if this is what you need – Tsuna Jan 31 '19 at 00:36

2 Answers2

2

Is this what you are looking for? which will print out all the file names that you just need to loop through the file names then combine

from os import listdir
from os.path import isfile, join

folder_path = './'

onlyfiles = [f for f in listdir(folder_path) if isfile(join(folder_path, f))]
print(onlyfiles);

P.S. this is from How do I list all files of a directory?

EDIT, in case you really want do some kind of file extension filtering, I am using json files as example, change it to your need

from os import listdir
from os.path import isfile, join

folder_path = './'

onlyfiles = [f for f in listdir(folder_path) if isfile(join(folder_path, f))]
filter_extension = []
my_ext = 'json'

for f in listdir(folder_path):
    extension = f.split('.')[-1]
    if (extension == my_ext):
        filter_extension.extend([f])

print(filter_extension)
Dora
  • 6,776
  • 14
  • 51
  • 99
  • so If i rename your 'only files' to `excel_names" it should work? – user8261831 Jan 31 '19 at 00:36
  • I only want to print out xlsx files though, not my python files? – user8261831 Jan 31 '19 at 00:37
  • this will work if I can modify the code for xlsx files only – user8261831 Jan 31 '19 at 00:38
  • @newtoR edit the `folder_path` to the dir where your xlsx are. If xlsx files are with other files. try checking all files and only save those with xlsx extension into another variable. But I would prefer you saving all xlsx into a separate folder which would be easier to manage – Dora Jan 31 '19 at 00:39
  • Oh true! that makes a lot of sense and it fixed it for me – user8261831 Jan 31 '19 at 00:42
  • one more question if you dont mind, how can I modify my code to delete the first 5 rows when importing the files, other than for the first file? – user8261831 Jan 31 '19 at 00:43
  • @newtoR I believe I get what you mean, I would suggest you move those files into a new folder let's say `already_imported`, this way the other folder is always having the new files that's not imported and also IF something really goes wrong. You still have the files that's already imported to check out instead of deleting them – Dora Jan 31 '19 at 00:48
  • I have fixed the importing files error, check my edited post to see my problem, thanks for all your help btw – user8261831 Jan 31 '19 at 00:50
  • @newtoR honestly I am not into excel combining, creating, answering your post due to the question of don't have to type the full file name each time. Would suggest you to create another post for that question instead of editing this one so it'll be more convenient for others having same issue as yours. – Dora Jan 31 '19 at 00:59
  • 1
    @newtoR by the way, if you think `[df[1:] for df in frames[1:]]` might be the reason, I would suggest using regular `for` loop which then you'll be able to do some `prints` to see what values are given which might cause the problem. – Dora Jan 31 '19 at 01:01
0

Something like this? Put the consistent parts into static variables; put the variable part into a list. Build with another list comprehension.

prefix = "Market_Information_System_Control_daily_trading_day_"
ext = ".xlsx"
dates  = ["170701",
          "170702",
          "170703",
          "170704",
          "170731"]
excel_names = [prefix + day + ext for day in dates]

print(excel_names)

Result:

['Market_Information_System_Control_daily_trading_day_170701.xlsx',
 'Market_Information_System_Control_daily_trading_day_170702.xlsx',
 'Market_Information_System_Control_daily_trading_day_170703.xlsx',
 'Market_Information_System_Control_daily_trading_day_170704.xlsx',
 'Market_Information_System_Control_daily_trading_day_170731.xlsx']
Prune
  • 76,765
  • 14
  • 60
  • 81
  • thanks, I have to do this for a couple years, for everymonth for every day. Is there a way to make it faster? – user8261831 Jan 31 '19 at 00:00
  • also why did you do `+day` what is `day`? Thanks – user8261831 Jan 31 '19 at 00:03
  • `day` is the index variable for the comprehension, `for day in dates`. – Prune Jan 31 '19 at 00:05
  • when i used your code, it added the first data file but not the rest? – user8261831 Jan 31 '19 at 00:07
  • `"everymonth for every day"` [sic] is a change to the problem you posted. Yes, you can use the `datetime` package to take a starting date, increment it each time through a loop, and then extract the YYMMDD form of the date. Convert that to a string. – Prune Jan 31 '19 at 00:07
  • @newtoR: the code I posted produces the desired list of strings, as shown in the extended answer. There's no way we can debug your usage when you haven't posted it. – Prune Jan 31 '19 at 00:11
  • my code prints that as well, but when I open my output excel file it hadn't joined everything, weird. – user8261831 Jan 31 '19 at 00:12