1

I have a time series dataset in a .csv file that I want to process with Pandas (using Canopy). The column names from the file are a mix of strings and isotopic numbers.

    cycles      40   38.02   35.98      P4
0        1  1.1e-8  4.4e-8  7.7e-8  8.8e-7
1        2  2.2e-8  5.5e-8  8.8e-8  8.7e-7
2        3  3.3e-8  6.6e-8  9.9e-8  8.6e-7

I would like this DataFrame to look like this

    cycles      40      38      36      P4
0        1  1.1e-8  4.4e-8  7.7e-8  8.8e-7
1        2  2.2e-8  5.5e-8  8.8e-8  8.7e-7
2        3  3.3e-8  6.6e-8  9.9e-8  8.6e-7

The .csv files won't always have exactly the same column names; they numbers could be slightly different from file to file. To handle this, I've sampled the column names and rounded the values to the nearest integer.This is what my code looks like so far:

import pandas as pd
import numpy as np

df = {'cycles':[1,2,3],'40':[1.1e-8,2.2e-8,3.3e-8],'38.02':[4.4e-8,5.5e-8, 6.6e-8],'35.98':[7.7e-8,8.8e-8,9.9e-8,],'P4':[8.8e-7,8.7e-7,8.6e-7]}
df = pd.DataFrame(df, columns=['cycles', '40', '38.02', '35.98', 'P4'])

colHeaders = df.columns.values.tolist()
colHeaders[1:4] = list(map(float, colHeaders[1:4]))
colHeaders[1:4] = list(map(np.around, colHeaders[1:4]))
colHeaders[1:4] = list(map(int, colHeaders[1:4]))
colHeaders = list(map(str, colHeaders))

I tried df.rename(columns={df.loc[ 1 ]:colHeaders[ 0 ]}, ...), but I get this error:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

I've read this post as well as the pandas 0.17 documentation, but I can't figure out how to use it to selectively replace the column names in a way that doesn't require me to assign new column names manually like this post.

I'm fairly new to Python and I've never posted on StackOverflow before, so any help would be greatly appreciated.

Community
  • 1
  • 1
Icarus
  • 11
  • 1

1 Answers1

1

You could use a variant of your approach, but assign the new columns directly:

>>> cols = list(df.columns)
>>> cols[1:-1] = [int(round(float(x))) for x in cols[1:-1]]
>>> df.columns = cols
>>> df
   cycles            40            38            36            P4
0       1  1.100000e-08  4.400000e-08  7.700000e-08  8.800000e-07
1       2  2.200000e-08  5.500000e-08  8.800000e-08  8.700000e-07
2       3  3.300000e-08  6.600000e-08  9.900000e-08  8.600000e-07
>>> df.columns
Index(['cycles', 40, 38, 36, 'P4'], dtype='object')

Or you could pass a function to rename:

>>> df = df.rename(columns=lambda x: x if x[0].isalpha() else int(round(float(x))))
>>> df.columns
Index(['cycles', 40, 38, 36, 'P4'], dtype='object')
DSM
  • 342,061
  • 65
  • 592
  • 494
  • Thanks, that's much cleaner. The next step is to go from '40' to 'Ar40' and '132' with 'Xe132', which is why I was hoping for a way to overwrite all of the column names with a new list of column names. Is that just better handled with a manual rename? – Icarus Nov 14 '15 at 13:44