0

I have a bunch of CSV files which contain data for a specific time and time is encoded as the filename:

time1.csv
Label val1 val2
a      5    6
b.     6    4

time2.csv
Label val1 val2
a      5    6
c      6    4

...

I can read each file into a Pandas data frame. Then I want to: Add the "time" column into each data frame and then merge all the data frames into a single data frame.

Is there a way to accomplish this?

Michael Szczesny
  • 4,911
  • 5
  • 15
  • 32
k2k2e6
  • 510
  • 6
  • 17
  • What value the time column should contain? You should add the expected result to your question, like this its hard to say what you really want. – Valentino Oct 10 '20 at 22:53

2 Answers2

1

Given the list of filenames, you can assign a time column to equal the filename (excluding the last four characters, i.e. .csv) and concatenate the result.

df = pd.concat([pd.read_csv(filename).assign(time=filename[:-4])
                for filename in filenames])
Alexander
  • 105,104
  • 32
  • 201
  • 196
0

You can create a list of your dfs times then iterate over it and append them to finaldf

finaldf =pandas.DataFrame()
for df in times:
    finaldf = finaldf.append(df.assign(time= timename),ignore_index=True)

print(finaldf)

to do it more automatically you can put filesname in an array and read it in the loap as well like this:

finaldf =pandas.DataFrame()
times = ['time1','time2']
for csv in times:
    df = pd.read_csv('{}.csv'.format(csv))
    finaldf = finaldf.append(df.assign(time= csv),ignore_index=True)
    
print(finaldf)

the result would be :

Label val1 val2   time
a      5    6     time1
b      6    4     time1
a      5    6     time2
c      6    4     time2
Mehdi Golzadeh
  • 2,594
  • 1
  • 16
  • 28
  • Isnt the request to merge pandas or concat pandas? – Joe Ferndz Oct 10 '20 at 23:02
  • I added the final output of my code. You can also use "concat". merge used to merge two dfs on some columns, not concat by columns or by rows – Mehdi Golzadeh Oct 10 '20 at 23:07
  • This solution is known as a quadratic copy and grows exponentially slower: https://stackoverflow.com/questions/36489576/why-does-concatenation-of-dataframes-get-exponentially-slower – Alexander Oct 10 '20 at 23:15
  • @Alexander I dont think for such number of files it makes it slower. it may get slower for loops bigger than 10000. But your solution works fine as well. – Mehdi Golzadeh Oct 10 '20 at 23:21
  • Create 100 files and time the results between the two methods. It will become clear why one should avoid appending to dataframes within a loop. – Alexander Oct 10 '20 at 23:24
  • @k2k2e6 you can use. finaldf = pd.concat([finaldf,df]) in the povided code – Mehdi Golzadeh Oct 10 '20 at 23:25
  • No, that won’t make it any faster. I suggest you read the link I provided. – Alexander Oct 11 '20 at 00:15