2

I have a project that use 3D data saved on text files. I'am currently using a single space to split data on first dimension, one line feed (\n) to split the second dimension and 2 line feeds (\n\n) to split the last dimension and was using the default read and write of python. The interpretation of these data are done using string splits and list comprehensions. Is there a way to do this using pandas?

I already tested the dataframe.write using a 3D numpy data and get the followning error: ValueError: Must pass 2-d input. Is it possible to workaround this?

Hemerson Tacon
  • 2,419
  • 1
  • 16
  • 28
  • Can you post your code and some sample data? See: https://stackoverflow.com/help/mcve – Evan Dec 13 '17 at 19:12
  • Possible duplicate of https://stackoverflow.com/questions/3685265/how-to-write-a-multidimensional-array-to-a-text-file ? – B. M. Dec 13 '17 at 19:41
  • You should check out xarray and their recommended method of writing such multidimensional data by means of NetCDF. Normal text files is not a good choice for 3D – Georgy Dec 13 '17 at 21:11

2 Answers2

3

Pandas own a Panel class to manage 3D arrays, and represent them like unstacked dataframes. Some axis transformations are however required to have a correct layout in the text file:

a=arange(27).reshape(3,3,3)

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

Writing :

df=pd.Panel(np.rollaxis(a,2)).to_frame()
df.to_csv('datap.txt')

Then the text file contains:

major,minor,0,1,2
0,0,0,1,2
0,1,3,4,5
0,2,6,7,8
1,0,9,10,11
1,1,12,13,14
1,2,15,16,17
2,0,18,19,20
2,1,21,22,23
2,2,24,25,26

you can also use to_html to enhance readability :

enter image description here

You can then read back:

#read
df=pd.read_csv('datap.txt',index_col=[0,1])
a2= np.rollaxis(np.rollaxis(df.to_panel().values,2),2)

In [161]: np.allclose(a,a2)
Out[161]: True

But in the future you will have to use the xarray module for that.

B. M.
  • 18,243
  • 2
  • 35
  • 54
1

I don't know of a really clean solution to this, but one way to approach it manually is as follows:

import pandas as pd
import numpy as np

df = pd.read_csv('tmp.csv', skip_blank_lines=False)

# add a blank row at the end
df = df.reindex(np.arange(len(df.index) + 1))

# add an index of the third dimension
indices = df[df.isnull().all(1)].index
df['level'] = pd.Series(range(len(indices)), index=indices)
df['level'].fillna(method='bfill', inplace=True)

# reset indices within each "group"
df = df.groupby('level').apply(lambda x: x.reset_index())
df = df.drop(['level', 'index'], axis=1).dropna(how='all')

The result is a multiply-indexed dataframe representing your 3D data.

jakevdp
  • 77,104
  • 11
  • 125
  • 160