1

I am trying to make a program that takes a large data file of integers and creates a new csv in a different format, where it takes the x,y,z of 30 lines and merges them into one line.

The large dataset is formatted in (timestamp, x,y,z)

Ex:

0.000, 5, 6, 8,

1.000, -6, 7, 9,

2.000, -15, 25, 23,

or:

timestamp, x1, y1, z1

timestamp, x2, y2, z2

timestamp, x3, y3, z3

The new dataset would look like this:

delta timestamp, x1, y1, z1, x2, y2, z2, x3, y3, z3....x30, y30, z30,

delta timestamp, x31, y31, z31, x32, y32, z32, x33,... x60, y60, z60,

etc.. (each line containing 30 x,y,z's)

I thought of maybe appending a \n every 30 lines and then replacing each new line with a comma. And my code below doesn't work. It just puts an extra comma to where the new data looks like this:

timestamp, x1, y1, z1,, timestamp, x2, y2, z2,, timestamp... 

Do you guys have any ideas?

list = []
import csv
i=0
results = []
with open('bikefall.csv', newline='') as inputfile:
    for row in csv.reader(inputfile):
        i+=1
        if i%30==0:
            results.append(row)
            results.append('\n')
        else:
            results.append(row)

print("\n".join([item.replace('\n', ',') for item in 
open('bikefall.csv').read().split('\n\n')]))
Drise
  • 4,310
  • 5
  • 41
  • 66
aval
  • 21
  • 4

3 Answers3

0

I didn't know how you calculate your delta so I just put a placeholder function.

Regarding your code you can improve it a little using enumerate so you don't have to update i manually.

You can also get the first 4 items of each row in a csv file using the slice notation.

import csv

def calculate_delta(timestamps):
    pass

output = ""

with open('bikefall.csv', "r") as inputfile:
    timestamps = []
    results = []
    for i, row in enumerate(csv.reader(inputfile)):
        timestamp, x, y, z = row[:4]
        timestamps.append(timestamp)
        results.extend((x, y, z))
        if len(timestamps) == 30:
            delta = calculate_delta(timestamps)
            str_timestamps = ", ".join(results)
            output += "{}, {}\n".format(delta, str_timestamps)
            timestamps = []
            results = []

print(output)

This code has a bug though, what happens when you only have 29 rows in your CSV?

Those 29 rows are going to get ignored, so you still need to check if the current row is the last one in the csv file, and process it accordingly.

sainoba
  • 158
  • 1
  • 13
0

One approach would be to read your CSV file in blocks of 30 at a time. Then combine these rows. I have assumed delta is calculated by subtracting the first timestamp from the last timestamp in each block (another possibility would be difference between start of each block, so first would be 0?):

from itertools import zip_longest
import csv

f_input = open('bikefall.csv', newline='')
f_output = open('output.csv', 'w', newline='')

with f_input, f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)

    for rows in zip_longest(*[iter(csv_input)] * 30, fillvalue=None):
        rows = [[float(row[0])] + row[1:] for row in rows if row]
        delta = rows[-1][0] - rows[0][0]
        combined = [delta]

        for row in rows:
            combined.extend([row[1], row[2], row[3]])

        csv_output.writerow(combined)

The grouping is based on the itertools grouper() recipe in the Python documentation.

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
  • Each row contained the same elements and it seemed to be duplicating the previous row and constantly adding on 3 more x, y, z's, getting larger and larger. Am I doing something incorrectly? – aval Mar 17 '18 at 22:37
  • I tried to run your code and it created a csv file 10x as large as my original. It outputted a weird dataset. Ex: 0.387,-2,-3 0.387,-2,-3,-4,-5, 0.387,-2,-3,-4,-5,-6,-7 0.387,-2,-3,-4,-5,-6,-7,-8,-9 0.387,-2,-3,-4,-5,-6, -7,-8,-9,-10,-11 – aval Mar 17 '18 at 22:41
  • Are you sure you copy / pasted the script the same? Make sure the indentation is identical. – Martin Evans Mar 17 '18 at 22:55
0

This is a perfect job for zip. Here is a solution, much more pythonic than previous answers:

with open('bikefall.csv') as inputfile:
    # version using csv reader
    matrix = [[line[0],','.join(line[1:])] for line in csv.reader(inputfile)]
    # version using standard text file reader
    #matrix = [line.strip().split(',', maxsplit=1) for line in inputfile]

stamps, coords = zip(*matrix) # split matrix into stamps and coords

for n in range(0, len(stamps), 30):
  print(','.join((stamps[n],) + coords[n:n+30]))

Note: thanks to the slice notation, the last line with possibly less than 30 items, is managed automagically.

sciroccorics
  • 2,357
  • 1
  • 8
  • 21
  • I ran your code and I came up with an error: AttributeError: 'list' object has no attribute 'split'. Am I missing something? – aval Mar 17 '18 at 22:32
  • @aval: it's not you, it's me! I wrote two versions (one using csv reader, and one using text file reader) and messed up when I copy/paste the lines. I've edited the code, including the two versions (the latter commented out) so you may choose which one you prefer – sciroccorics Mar 18 '18 at 12:51