Rearranging .csv data using Python

Question

I have a csv file that contains data that has each days data split into a separate column.

'Time', 'Sun 01', 'Mon 02', 'Tue 03', 'Wed 04', 'Thu 05', 'Fri 06', 'Sat 07', 'Sun 08', 'Mon 09', 'Tue 10', 'Wed 11', 'Thu 12', 'Fri 13', 'Sat 14', 'Sun 15', 'Mon 16', 'Tue 17', 'Wed 18', 'Thu 19', 'Fri 20', 'Sat 21', 'Sun 22', 'Mon 23', 'Tue 24', 'Wed 25', 'Thu 26', 'Fri 27', 'Sat 28', 'Sun 29', 'Mon 30'
'00:00-00:05', '0.30', '0.30', '0.30', '0.30', '0.30', '0.40', '0.10', '0.20', '0.20', '0.20', '0.10', '0.20', '0.20', '0.30', '0.30', '0.10', '0.20', '0.20', '0.10', '0.10', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.20', '0.10', '0.10'
'00:05-00:10', '0.30', '0.30', '0.30', '0.30', '0.30', '0.50', '0.20', '0.10', '0.10', '0.20', '0.10', '0.30', '0.10', '0.20', '0.30', '0.10', '0.20', '0.10', '0.20', '0.20', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.10', '0.10', '0.10'
'00:10-00:15', '0.30', '0.30', '0.30', '0.30', '0.30', '0.40', '0.20', '0.20', '0.20', '0.20', '0.20', '0.30', '0.10', '0.30', '0.30', '0.20', '0.10', '0.20', '0.10', '0.10', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.20', '0.20', '0.10'
'00:15-00:20', '0.30', '0.30', '0.30', '0.30', '0.40', '0.50', '0.10', '0.10', '0.10', '0.20', '0.10', '0.30', '0.20', '0.30', '0.30', '0.10', '0.20', '0.20', '0.20', '0.20', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.10', '0.10', '0.00'
'00:20-00:25', '0.30', '0.30', '0.40', '0.40', '0.30', '0.40', '0.20', '0.20', '0.20', '0.20', '0.10', '0.30', '0.10', '0.30', '0.30', '0.10', '0.20', '0.10', '0.20', '0.10', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.20', '0.10', '0.20'

Using python, is there a way to rearrange the data so that each days data is added to the end of the previous days data having one long collumn?

Example:

Date, Time, Value,
01-01-2000, 00:00, 0.01
01-01-2000, 00:00, 0.01
01-01-2000, 00:05, 0.01
01-01-2000, 00:10, 0.01
02-01-2000, 00:00, 0.01
02-01-2000, 00:05, 0.01
02-01-2000, 00:10, 0.01

I'm getting stuck trying to recurse through the data. If I set the data from the csv to a variable I lose the separate lists and am not sure how to separate the data again so that I can append each day to the bottom of the new csv. Is there a method that can be used to store csv data in a variable that will maintain separate lists for each row?

So far I have:

import csv
month_year = "01-2000"
filename = 'test.csv'

converted_data = "converted_" + filename
cols = ['Time', 'Date(dd-mm-yyyy', 'kWh']

interval_count = 0
day = 1

with open(converted_data, 'w') as csvfile:
    csvwriter = csv.writer(csvfile)

    csvwriter.writerow(cols)

    with open(filename, 'r') as csvfile:
        data = csv.reader(csvfile)
        next(data)

        for line in data:
            total_count = len(line[1:]) * 288       # 288 = amount of 5 min intervals in 24 hours

            time_full = line[0]
            time_clean = (time_full[:5])
            if day <= 9:
                date = "0{0}{1}".format(day, month_year)
            else:
                date = "{0}{1}".format(day, month_year)
            # print(line)
            row = [time_clean, date, line[day]]
            print(row)
            csvwriter.writerow(row)
            interval_count += 1
            if interval_count % 288 == 0:
                day += 1
                interval_count = 0

Any help would be greatly appreciated.

Pandas package allows this - see [How to pivot a dataframe](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe). If pandas is ok, there's no need for you to reinvent the wheel. Also, there's no need for the code to be recursive, I untagged. — smci, Jul 22 '18 at 02:25

Patrick Artner · Answer 1 · 2018-07-19T09:03:42.933

Ive commented inside my code. Essentially you use zip() to get a column-wise view on your data. Then apply some logic to spice up the output with fimes for each day. Then you write all the data into your output file:

# First we create your data file to your specification:

with open("d.txt","w") as w:
    w.write("""'Time', 'Sun 01', 'Mon 02', 'Tue 03', 'Wed 04', 'Thu 05', 'Fri 06', 'Sat 07', 'Sun 08', 'Mon 09', 'Tue 10', 'Wed 11', 'Thu 12', 'Fri 13', 'Sat 14', 'Sun 15', 'Mon 16', 'Tue 17', 'Wed 18', 'Thu 19', 'Fri 20', 'Sat 21', 'Sun 22', 'Mon 23', 'Tue 24', 'Wed 25', 'Thu 26', 'Fri 27', 'Sat 28', 'Sun 29', 'Mon 30'
'00:00-00:05', '0.30', '0.30', '0.30', '0.30', '0.30', '0.40', '0.10', '0.20', '0.20', '0.20', '0.10', '0.20', '0.20', '0.30', '0.30', '0.10', '0.20', '0.20', '0.10', '0.10', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.20', '0.10', '0.10'
'00:05-00:10', '0.30', '0.30', '0.30', '0.30', '0.30', '0.50', '0.20', '0.10', '0.10', '0.20', '0.10', '0.30', '0.10', '0.20', '0.30', '0.10', '0.20', '0.10', '0.20', '0.20', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.10', '0.10', '0.10'
'00:10-00:15', '0.30', '0.30', '0.30', '0.30', '0.30', '0.40', '0.20', '0.20', '0.20', '0.20', '0.20', '0.30', '0.10', '0.30', '0.30', '0.20', '0.10', '0.20', '0.10', '0.10', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.20', '0.20', '0.10'
'00:15-00:20', '0.30', '0.30', '0.30', '0.30', '0.40', '0.50', '0.10', '0.10', '0.10', '0.20', '0.10', '0.30', '0.20', '0.30', '0.30', '0.10', '0.20', '0.20', '0.20', '0.20', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.10', '0.10', '0.00'
'00:20-00:25', '0.30', '0.30', '0.40', '0.40', '0.30', '0.40', '0.20', '0.20', '0.20', '0.20', '0.10', '0.30', '0.10', '0.30', '0.30', '0.10', '0.20', '0.10', '0.20', '0.10', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.20', '0.10', '0.20'
""")

Then we read it back in and process it:

import csv

# we append each row into a list - we get lists of rows:
with open("d.txt","r",newline='') as r:
    reader = csv.reader(r, delimiter = ',', quotechar = "'", skipinitialspace = True)
    data = []
    for row in reader:
        data.append(row)

# we transpose these lists of rows into lists of columns, we seperate out the 
# time-row, we will need it multiple times - once for each day
time, *dataX = list(map(list,zip(*data)))
print(time)   # see (shortened) debug 
print(dataX)  # output below

# now we open a new csv, same settings then your old one:
with open("mod.txt","w",newline='') as w:
    writer = csv.writer(w,delimiter=',', quotechar="'",skipinitialspace=True, quoting=csv.QUOTE_ALL)
    # write a custom header
    writer.writerow(["date","time","value"])
    # for each row of data we need to create a new output row
    for r in dataX:
        # that we construct using the times we split out earlier
        for i,t in enumerate(time):
            if i==0: # this is just the text "'Time'" - dont need it
                continue
            # here we take the day ('Sun 01', 'Mon 02', ...), add the time t and index into the data
            writer.writerow([r[0],t,r[i]])


# read created file back in and print line-wise:
with open("mod.txt","r") as r:
    for row in r:
        print(row, end="")

Output:

# the time we split off
['Time', '00:00-00:05', '00:05-00:10', '00:10-00:15', '00:15-00:20', '00:20-00:25']

# the rest of the data
[['Sun 01', '0.30', '0.30', '0.30', '0.30', '0.30'], 
 ['Mon 02', '0.30', '0.30', '0.30', '0.30', '0.30'], 
 ['Tue 03', '0.30', '0.30', '0.30', '0.30', '0.40'], 
        **snipp - you get the gist of it **
 ['Sun 29', '0.10', '0.10', '0.20', '0.10', '0.10'], 
 ['Mon 30', '0.10', '0.10', '0.10', '0.00', '0.20']]

# the created file 
'date','time','value'
'Sun 01','00:00-00:05','0.30'
'Sun 01','00:05-00:10','0.30'
'Sun 01','00:10-00:15','0.30'
'Sun 01','00:15-00:20','0.30'
'Sun 01','00:20-00:25','0.30'
'Mon 02','00:00-00:05','0.30'
'Mon 02','00:05-00:10','0.30'
'Mon 02','00:10-00:15','0.30'
'Mon 02','00:15-00:20','0.30'
'Mon 02','00:20-00:25','0.30'
'Tue 03','00:00-00:05','0.30'
'Tue 03','00:05-00:10','0.30'
'Tue 03','00:10-00:15','0.30'
'Tue 03','00:15-00:20','0.30'
'Tue 03','00:20-00:25','0.40'
 **snipp - you get the gist of it **
'Sun 29','00:00-00:05','0.10'
'Sun 29','00:05-00:10','0.10'
'Sun 29','00:10-00:15','0.20'
'Sun 29','00:15-00:20','0.10'
'Sun 29','00:20-00:25','0.10'
'Mon 30','00:00-00:05','0.10'
'Mon 30','00:05-00:10','0.10'
'Mon 30','00:10-00:15','0.10'
'Mon 30','00:15-00:20','0.00'
'Mon 30','00:20-00:25','0.20'

If you print (for each row) instead of r[0] something like r[0].split()[-1] + "-01-2000" you would get even closer to your desired output. If you do like to have other quoting-options, read up on Quoting constants

HTH

I managed to find what feels like a much more convoluted solution. I'll be amending my code to use your method. Thanks Patrick! — Isamu, Jul 22 '18 at 08:05

Rearranging .csv data using Python

1 Answers1