2

I'm a beginner with python. I'm trying to get the difference between two adjacent columns in a csv file using python 2.7.

Sample input:

    Temperature        20     21     23     27 ...

    Smoke Obscuration  0.1    0.3    0.6    0.7 ...

    Carbon Dioxide     0.05   0.07   0.08   0.09 ...

    ......

    ......

I want to calculate the difference between two adjacent values and get the output like this:

    Temperature        0   1      2      4 ...

    Smoke Obscuration  0   0.2    0.3    0.1 ...

    Carbon Dioxide     0   0.02   0.01   0.01 ...

    ......

    ......

this is as far as I got:

import csv
with open("test1.csv", "rb") as f_in, open("test2.csv", "w") as f_out:
    r = csv.reader(f_in)
    w = csv.writer(f_out)
    for row in r:
        for i, v in enumerate(row):
        if i > 1:
                v = (float(row[i]) - float(row[i-1]))
        w.writerow(row)

It gave an error:

ValueError: could not convert string to float:

Could anyone help? Any guidance would be appreciated.

Alex
  • 29
  • 3
  • 1
    This is a nicely written question. For future reference, try using `print` statements to debug your output – pylang Jan 11 '17 at 00:15
  • That is an odd csv file. It appears to be space separated (you need `delimiter=' '` when you create the reader), but the first column names sometimes have spaces themselves (e.g., "Smoke Obscuration"). Right now, the csv reader thinks you have 1 giant column (its looking for commas). But because of the problem of spaces in the first column, I think you are going to need something else besides `csv.reader` to crack it. – tdelaney Jan 11 '17 at 00:35
  • 1
    Try the pandas library. This should be reaaaally simple, like maybe two lines of code, three including the 'import'. – vmg Jan 11 '17 at 00:51
  • @vmg have you reaaaaaaally tried it? The problem (partially) that column zero spaces aren't escaped and this will confuse most parsers. – tdelaney Jan 11 '17 at 00:57
  • True, I didn't even see that. But preprocessing that should not be hard at all, like removing all 'non-numeric strings' from every line of the file. If the data is well-behaved like the example, there must be a regex out there that will do that in an extra couple of lines at most. – vmg Jan 11 '17 at 01:02
  • 1
    Perhaps the real solution is to fix where ever this file came from in the first place. If its exported from something, see if you can make that something spit out valid CSV data. If you are copy/pasting from excel... well, that would explain why it looks so strange. – tdelaney Jan 11 '17 at 01:28

2 Answers2

0

You may have some spacing issues with your source file, so it may be hard to reproduce your specific error. Since I do not have your original file, I copied your data from here into a text file, then re-saved it as a csv in Excel. I didn't have the error you encountered other than the wrong output. This suggests that that data can be read and written fine provided the logic is correct.

Option 1: Use the csv module

I corrected some logic mainly by making each row an iterable (i.e. list), which the writerow method requires:

import csv

# with open("test1.csv", "r") as f_in, open("test2.csv", "w", newline="") as f_out: # python 3
with open("test1.csv", "r") as f_in, open("test2.csv", "wb") as f_out:            # python 2
    r = csv.reader(f_in)
    w = csv.writer(f_out)
    values = []
    for row in r:
        for i, v in enumerate(row):
            if i == 0:
                values.append(v)
            if i == 1:
                values.append(0)
            if i > 1:
                values.append(float(row[i]) - float(row[i-1]))
        w.writerow(values)
        values = []

Option 2: Use the pandas library

You can pip install pandas or (conda install pandas if you use Anaconda) and do this more simply:

import pandas as pd

df = pd.read_csv("test1.csv", header=None, index_col=0)
df2 = df.diff(axis=1)
df2.to_csv("test2.csv", header=False, na_rep=0)

Output csv for both options (in Excel)

enter image description here

When opened in a text editor, these outputs are comma-delimited by default. There are separate options for choosing different spacings if desired (see References).

Try these options. If you have errors, confirm that your source files are clean so that they are read correctly. For now, use print statements to verify the output you desire.

References:

  1. CSV file written with Python has blank lines between each row
  2. How do I write data to csv file in columns and rows from a list in python?
  3. How to use delimiter for csv in python
  4. delimiter - Writing a pandas to a csv file
Community
  • 1
  • 1
pylang
  • 40,867
  • 14
  • 129
  • 121
  • Certainly. This can all be simplified with a library like `pandas` assuming the source file is valid. My manual steps were due to lack of the original file containing the data. – pylang Jan 11 '17 at 00:58
  • 1
    Thank you so much for your help and for the references! :) – Alex Jan 11 '17 at 13:39
0

Your input file is not an easily parsed csv file. It uses spaces to dilimit columns but also uses spaces within column zero. I don't think the csv module will help you, but you can parse the line yourself with a couple of regexes. My example works by assuming column 0 names do not include digits. If that's not true in general, its going to break.

import re

_col_0_re = re.compile(r'[^\d]+')
_col_x_re = re.compile(r'[\d\.]+')

def get_row(line):
    row = []
    line = line.strip()
    match = _col_0_re.match(line)
    if match:
        # pull out column 0 string
        row.append(line[:match.end()].strip())
        # find the remaining floats on the line
        row.extend(float(col) for col in _col_x_re.findall(line[match.end():]))
    return row

with open("test1.csv", "r") as f_in, open("test2.csv", "w") as f_out:
    for line in f_in:
        row = get_row(line)
        print(row)
        if row:
            diffs = (row[i] - row[i-1] for i in range(2, len(row)))
            diff_str = ''.join('{:10.2f}'.format(diff) for diff in diffs)
            f_out.write('{0:20}  0 {1}\n'.format(row[0], diff_str))

The output from your sample data is

Temperature           0       1.00      2.00      4.00
Smoke Obscuration     0       0.20      0.30      0.10
Carbon Dioxide        0       0.02      0.01      0.01
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • Nice attempt with `re`. When I open the `csv` in Excel, I get only two columns although it reads fine in a text editor. – pylang Jan 11 '17 at 01:44
  • @pylang - Importing to another tool has the same problem we had reading it in python. Perhaps its got fixed-width columns and a little tweaking of the `{:10.2f}` format is in order. But I don't know enough of the rules for the file to say for sure. – tdelaney Jan 11 '17 at 01:53
  • Understood. Are we sure the file looks as presented? The author may have edited the spacing to look nice for the post. – pylang Jan 11 '17 at 01:59
  • Thank you so much! That helped a lot! :) – Alex Jan 11 '17 at 13:38