0

I have a csv file which contains approximately 100 columns of data. Each column represents temperature values taken every 15 minutes throughout the day for each of the 100 days. The header of each column is the date for that day. I want to convert this into two columns, the first being the date time (I will have to create this somehow), and the second being the temperatures stacked on top of each other for each day.

My attempt:

 with open("original_file.csv") as ofile:  
    stack_vec = []
    next(ofile) 
    for line in ofile:             
        columns = lineo.split(',') # get all the columns         
        for i in range (0,len(columns)):
            stack_vec.append(columnso[i])
            np.savetxt("converted.csv",stack_vec, delimiter=",", fmt='%s') 

In my attempt, I am trying to create a new vector with each column appended to the end of it. However, the code is extremely slow and likely not working! Once I have this step figured out, I then need to take the date from each column and add 15 minutes to the date time for each row. Any help would be greatly appreciated.

Sjoseph
  • 853
  • 2
  • 14
  • 23
  • For .csv I always recommend use the numpy function "genfromtext". csvData = genfromtxt('original_file.csv', delimiter=',') Afterwards you can work with csvData like any other numpy array without having to worry about things like end-of-line character and so on. Also, it should be quite fast then. What do you mean by "likely not working"? – offeltoffel Jul 10 '17 at 10:55
  • Thanks for the tip. By likely not working, I mean the program is running but not completing. I terminate it after about 5 minutes or so. – Sjoseph Jul 10 '17 at 11:04
  • I see. One thing that strikes me is that you have np.savetxt within your loop over all 100 columns. So the same file is saved again and again. Anyway, you should consider Marvin's answer. He uses genfromtext as well and explains in detail how you would use it in your situation. – offeltoffel Jul 10 '17 at 11:11

1 Answers1

1

If i got this correct you have a csv with 96 rows and 100 Columns and want to stack in into one vector day after day to a vector with 960 entries , right ?

An easy approach would be to use numpy:

    import numpy as np 

    x = np.genfromtxt('original_file.csv', delimiter=',')
    data = x.ravel(order ='F')

Note numpy is a third party library but the go-to library for math. the first line will read the csv into a ndarray which is like matrix ( even through it behaves different for mathematical operations)

Then with ravel you vectorize it. the oder is so that it stacks rows ontop of each other instead of columns, i.e day after day. (Leave it as default / blank if you want time point after point)

For your date problem see How can I make a python numpy arange of datetime i guess i couldn't give a better example.

if you have this two array you can ensure the shape by x.reshape(960,1) and then stack them with np.concatenate([x,dates], axis = 1 ) with dates being you date vector.