1

I have 5 datasets, as CSV files, they each contain event logs on a computer, Monday-Friday.

So:

Monday.csv
Tuesday.csv
Wednesday.csv
Thursday.csv
Friday.csv

I was wondering how I could merge all of these together into one big file, each dataset, is identical in format with 80 columns as well as track of which day of the week it was, when looking at this larger dataset with all 5 days.

So all 5 csv's would become 1 bigger one like:

Week1.csv

Could this be possible with pandas? or would I need another library?

Update Import multiple csv files into pandas and concatenate into one DataFrame This helps me do it.

But my CSV files include the first row as a header, when I merge them it includes the same header 5 times through the document when the pdf's merge, is there a way to remove the first column from each one before you merge them?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • 2
    Does this answer your question? [Import multiple csv files into pandas and concatenate into one DataFrame](https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe) – Chris Mar 02 '21 at 00:42
  • @Chris It does thank you, my csv file includes the header files at the top of the CSV, when I merge them it copies the header file multiple times at the merge point of the CSV, is there a workaround to this? Like a way to remove the header file, from all the CSV's before merging them? – Denis Vecchiato Mar 02 '21 at 09:20

4 Answers4

0

Check the parameter "header" in pandas.read_csv: Row number(s) to use as the column names, and the start of the data

xkudsraw
  • 149
  • 12
0

It can be done with Pandas if you already use it:

days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']

dfs = (pd.read_csv(day + '.csv').assign(Weekday=day) for day in days)

pd.concat(dfs).to_csv('Week.csv')

But if can also be done without Pandas, because csv files are plain text files and you only need to add only column (and keep only one header). Assuming the delimiter is the comma (,):

days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']

# extract one header
with open('Monday.csv', 'b') as fd:
    header = 'Weekday,' + next(fd)
with open('Week.csv', 'w') as fdout:
    fdout.write(header)                            # write the new header
    for day in days:                               # loop over the days
        with open(day + '.csv') as fdin:
            _ = next(fdin)                         # skip header
            for line in fdin:                      # and copy other lines
                fdout.write(day + ',' + line)
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
0

How about this?

import pandas as pd
import glob

path = r'C:\your_path_here' # use your path
all_files = glob.glob(path + "/*.csv")

# create list to append to
li = []

# loop through file names in the variable named 'all_files'
for filename in all_files:
    df = pd.read_csv(filename, index_col=None, skiprows=1, header=o)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

Notice: pd.read_csv has an argument for skiprows=1

Check out this link.

https://www.listendata.com/2019/06/pandas-read-csv.html

ASH
  • 20,759
  • 19
  • 87
  • 200
0

You don't need pandas or anything that parses the CSV files. Just use fileinput.input:

import fileinput

files = ('Monday.csv', 'Tuesday.csv', 'Wednesday.csv', 'Thursday.csv', 'Friday.csv')
with fileinput.input(files=files) as infile, open('Week1.csv', 'w') as outfile:
    for line in infile:
        if fileinput.isfirstline() and fileinput.filename() != files[0]:
            continue        # skip the CSV header line of all files except the first
        print(line, end='', file=outfile)
         
mhawke
  • 84,695
  • 9
  • 117
  • 138