0

So what i want to do is select a column and copy the values just under the same column i select, i know i can use pandas dataframe to select the column just by the name of it, but i dont know if it's better to use openpyxl instead. There are many similar question about this but no one answer my question. Here is my code where i try to use dataframes and numpy:

for file in files:
        fileName = os.path.splitext(file)[0]
        if fileName == 'fileNameA':
            df = pd.read_excel(file)
            list_dates = ['the string of the date i need' for dates in df['Date']]
            # Here what happend is 
            # that for every date it generates a list with dates
            print(list_dates)
            new_df = df.loc[np.repeat(df['Dates'], len(list_dates)]
            writer = pd.ExcelWriter('fileNameA1.xlsx', engine='xlsxwriter')
            new_df.to_excel(writer, 'Sheet 1')
            writer.save()
except Exception as e:
    print(e)

#Input data:
Date
01/12/2018
02/12/2018
03/12/2018
04/12/2018

#Output i want:
Date
01/12/2018
02/12/2018
03/12/2018
04/12/2018
01/12/2018
02/12/2018
03/12/2018
04/12/2018

Which is the best alternative, working directly with openpyxl or using pandas and then use a writer to generate the xlsx?

In this question they use df_try or concat() but how do i know the number of time i should repeat it.

2 Answers2

0

Just use NewDF = pd.concat([df, df]) This will duplicate all rows of df.

If you're trying to duplicate your rows three times or some other odd interval, you could just mash together a temporary df to get the desired results (for adding two copies of df, use the following):

tempdf = pd.concat([df, df])

NewDF = pd.concat([df, tempdf])

Dylan Smith
  • 118
  • 1
  • 10
0

Best is usually too subjective to be any good and it is for this reasons that questions asking for library recommendations will be closed.

If you're not doing any real manipulation of the data for statistical purposes, etc. then you probably don't need Pandas. Sticking with a single library can mean your code is easier to understand and maintain.

One approach in openpyxl would allow you to simply append() the dates at the end of the current worksheet. Something like this: (the code will probably need some changes).

for row in ws:
   ws.append(row[:1])
Charlie Clark
  • 18,477
  • 4
  • 49
  • 55