10

I am trying to add column headers to csv file that I have parsed into a dataframe withing Pandas.

dfTrades = pd.read_csv('pnl1.txt',delim_whitespace=True,header=None,);
dfTrades = dfTrades.drop(dfTrades.columns[[3,4,6,8,10,11,13,15,17,18,25,27,29,32]], axis=1)     # Note: zero indexed
dfTrades = dfTrades.set_index([dfTrades.index]);
df = pd.DataFrame(dfTrades,columns=['TradeDate',
                                      'TradeTime',
                                      'CumPnL',
                                      'DailyCumPnL',
                                      'RealisedPnL',
                                      'UnRealisedPnL',
                                      'CCYCCY',
                                      'CCYCCYPnLDaily',
                                      'Position',
                                      'CandleOpen',
                                      'CandleHigh',
                                      'CandleLow',
                                      'CandleClose',
                                      'CandleDir',
                                      'CandleDirSwings',
                                      'TradeAmount',
                                      'Rate',
                                      'PnL/Trade',
                                      'Venue',
                                      'OrderType',
                                      'OrderID'
                                      'Code']);


print df

The structure of the data is:

01/10/2015 05:47.3  190 190 -648 838 EURNOK -648 0  0 611   -1137   -648 H 2     -1000000   9.465   -648    INTERNAL    IOC 287 AS

What Pandas returns is:

  TradeDate  TradeTime  CumPnL  DailyCumPnL  RealisedPnL  UnRealisedPnL  \
0            NaN        NaN     NaN          NaN          NaN            NaN   ...

I would appreciate any advice on the issue.

Thanks

Ps. Thanks to Ed for his answer. I have tried your suggestion with

df = dfTrades.columns=['TradeDate',
                   'TradeTime',
                   'CumPnL',
                   'DailyCumPnL',
                   'RealisedPnL',
                   'UnRealisedPnL',
                   'CCYCCY',
                   'CCYCCYPnLDaily',
                   'Position',
                   'CandleOpen',
                   'CandleHigh',
                   'CandleLow',
                   'CandleClose',
                   'CandleDir',
                   'CandleDirSwings',
                   'TradeAmount',
                   'Rate',
                   'PnL/Trade',
                   'Venue',
                   'OrderType',
                   'OrderID'
                   'Code'];

But now the problem has morphed to:

 ValueError: Length mismatch: Expected axis has 22 elements, new values have     21 elements

I have taken the shape of the matrix and got: dfTrades.shape

(12056, 22)

So sadly i still need some help :(

noidea
  • 1,149
  • 3
  • 8
  • 9
  • Your last error is clear you have 22 columns but you're trying to pass a list of column names of 21, it's unclear what you expect this to do: `dfTrades.set_index([dfTrades.index]);` – EdChum Jan 07 '16 at 16:47
  • So what fixed your problem? – EdChum Jan 08 '16 at 13:37
  • Hi EdChum - Thank you for your help. Problem fixed, sorry for coming back late. – noidea Jan 08 '16 at 16:10

3 Answers3

8

Assign directly to the columns:

df.columns = ['TradeDate',
                                      'TradeTime',
                                      'CumPnL',
                                      'DailyCumPnL',
                                      'RealisedPnL',
                                      'UnRealisedPnL',
                                      'CCYCCY',
                                      'CCYCCYPnLDaily',
                                      'Position',
                                      'CandleOpen',
                                      'CandleHigh',
                                      'CandleLow',
                                      'CandleClose',
                                      'CandleDir',
                                      'CandleDirSwings',
                                      'TradeAmount',
                                      'Rate',
                                      'PnL/Trade',
                                      'Venue',
                                      'OrderType',
                                      'OrderID'
                                      'Code']

What you're doing is reindexing and because the columns don't agree get all NaNs as you're passing the df as the data it will align on existing column names and index values.

You can see the same semantic behaviour here:

In [240]:
df = pd.DataFrame(data= np.random.randn(5,3), columns = np.arange(3))
df

Out[240]:
          0         1         2
0  1.037216  0.761995  0.153047
1 -0.602141 -0.114032 -0.323872
2 -1.188986  0.594895 -0.733236
3  0.556196  0.363965 -0.893846
4  0.547791 -0.378287 -1.171706

In [242]:
df1 = pd.DataFrame(df, columns = list('abc'))
df1

Out[242]:
    a   b   c
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN

Alternatively you can pass the np array as the data:

df = pd.DataFrame(dfTrades.values,columns=['TradeDate',

In [244]:
df1 = pd.DataFrame(df.values, columns = list('abc'))
df1

Out[244]:
          a         b         c
0  1.037216  0.761995  0.153047
1 -0.602141 -0.114032 -0.323872
2 -1.188986  0.594895 -0.733236
3  0.556196  0.363965 -0.893846
4  0.547791 -0.378287 -1.171706
EdChum
  • 376,765
  • 198
  • 813
  • 562
1

You can try this way: You can use names directly in the read_csv

names : array-like, default None List of column names to use. If the file contains no header row, then you should explicitly pass header=None

Cov = pd.read_csv("path/to/file.txt", sep='\t', 
                  names = ["Sequence", "Start", "End", "Coverage"])
Frame=pd.DataFrame([Cov], columns = ["Sequence", "Start", "End", "Coverage"])

this answer.

Farshid Shekari
  • 2,391
  • 4
  • 27
  • 47
1

you need to do dfTrades.values instead of dfTrades when passing to pandas pd.DataFrame.

column_names= ['TradeDate',
               'TradeTime',
               'CumPnL',
               'DailyCumPnL',
               'RealisedPnL',
               'UnRealisedPnL',
               'CCYCCY',
               'CCYCCYPnLDaily',
               'Position',
               'CandleOpen',
               'CandleHigh',
               'CandleLow',
               'CandleClose',
               'CandleDir',
               'CandleDirSwings',
               'TradeAmount',
               'Rate',
               'PnL/Trade',
               'Venue',
               'OrderType',
               'OrderID'
               'Code']


df1 = pd.DataFrame(dfTrades.values, columns = column_names )

df1.head()
user2653663
  • 2,818
  • 1
  • 18
  • 22
Jaynab
  • 11
  • 1