0

I am trying to get the difference between each element after reading multiple csv files. Each csv file has 13 rows and 128 columns. I am trying to get the column-wise difference

I read the files using

data = [pd.read_csv(f, index_col=None, header=None) for f in _temp]

I get a list of all samples.

According to this I have to use .diff() to get the difference. Which goes something like this

data.diff()

This works but instead of getting the difference between each row in the same sample, I get the difference between each row of one sample to another sample.

Is there a way to separate this and let the difference happen within each sample?

Edit

Ok I am able to get the difference between the data elements by doing this

_local = pd.DataFrame(data)

_list = []
_a = _local.index

for _aa in _a:
    _list.append(_local[0][_aa].diff())

flow = pd.DataFrame(_list, index=_a)

I am creating too many DataFrames, is there a better way to do this?

Community
  • 1
  • 1
Akshay
  • 2,622
  • 1
  • 38
  • 71

2 Answers2

1

Here is a relatively efficient way to read you dataframes one at a time and calculate their differences which are stored in a list df_diff.

df_diff = []
df_old = pd.read_csv(_temp[0], index_col=None)
for f in _temp[1:]:
    df = pd.read_csv(f, index_col=None)
    df_diff.append(df_old - df)
    df_old = df
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • OK. What is the output of `df.shape`. I first want to ensure that you are reading the files correctly. It should be (13, 128). – Alexander Jun 08 '16 at 03:52
1

Since your code work you should real post on https://codereview.stackexchange.com/

(PS. The leading "_" is not really pythonic. pls avoid. It makes your code harder to read. )

_local = pd.DataFrame(data)
_list  = [ _local[0][_aa].diff() for _aa in _local.index ]
flow   = pd.DataFrame(_list, index=_local.index )
Community
  • 1
  • 1
Merlin
  • 24,552
  • 41
  • 131
  • 206