1

I have several datafiles that I load using

    df = pd.concat((pd.read_csv(f[:-4]+'.txt', delimiter='\s+',
                    header=8) for f in files))

The format of this DataFrame is then

    Field    Temp.    Momentum
0     200      25        0.541
1     300      26        0.580
2     400      25        0.700
.      .       .           .
.      .       .           .
0     NaN      25        0.700
1     NaN      50        0.500
.     NaN      70        0.300
.      .       .           .

I want to be able to transform this into a Pandas DataFrame where each row contains an Numpy Array like so

                Field                         Temp.                           Momentum
0     np.array([200, 300, 400])      np.array([25, 26, 25])        np.array([0.541, 0.580, 0.700])
1                NaN                 np.array([25, 50, 70])        np.array([0.700, 0.500, 0.300])
.
.

The only way I can come up with is looping through each row and append to a Numpy array, which is then transformed to a Pandas Series and appended to a DataFrame. This seems like a very round about method of solving this problem - and it is very slow. So is there a more neat way of handling this?

Edit: The slow code is either loading with Numpy from the start as shown below or the above mentioned method, which I haven't actually coded but I am guessing is very slow

for f in files:
    contents = np.loadtxt(f, skiprows=12).T
    N = data.shape[0]
    row = pd.Series(list(contents), index=columns[:N])
    df = df.append(row, ignore_index=True)
Thomas
  • 13
  • 3

1 Answers1

0

First I think working with list/arrays this way in pandas is not good idea.

Possible solution:

df = (df.groupby((df.index == 0).cumsum())
        .agg(list)
        .applymap(lambda x: np.nan if np.isnan(np.array(x)).all() else np.array(x)))
print (df)
                   Field         Temp.            Momentum
1  [200.0, 300.0, 400.0]  [25, 26, 25]  [0.541, 0.58, 0.7]
2                    NaN  [25, 50, 70]     [0.7, 0.5, 0.3]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This works exactly as intended - Thanks! I am not sure how I would handle my data without using Pandas DataFrame. Each row will contain an experiment, and other columns (not present in this question) will make me able quickly filter, which Field/Temp/Momentum values to be plotted or similar. – Thomas Nov 10 '20 at 10:56
  • @Thomas - If not large data, then no problem, it is possible, but a bit complciated. – jezrael Nov 10 '20 at 10:57