6

I have a list that contains many arrays.

coef

[array([[1.72158862]]),
 array([[3.28338167]]),
 array([[3.28004542]]),
 array([[6.04194548]])]

Put it into dataframe gives:

azone = pd.DataFrame(
    {'zone': zone,
     'coef': coef
    })

    zone    coef
0   1   [[1.7215886175218464]]
1   2   [[3.283381665861124]]

I wonder if there are ways to remove brackets. I tried tolist() but not giving me a result.

Also for another list:

value

[[array([8.46565297e-294, 1.63877641e-002]),
 array([1.46912451e-220, 2.44570170e-003]),
 array([3.80589351e-227, 4.41242801e-004])]

I want to have only keep the second value. desire output is:

   value
0  1.63877641e-002
1  2.44570170e-003
2  4.41242801e-004
Celine
  • 71
  • 1
  • 1
  • 5
  • The brackets show us that the arrays have a (1,1) shape. They aren't just a pretty printing device. – hpaulj Jul 04 '18 at 01:11

2 Answers2

9

Using Ravel:

coef = [np.array([[1.72158862]]),
        np.array([[3.28338167]]),
        np.array([[3.28004542]]),
        np.array([[6.04194548]])]

coef = np.array(coef).ravel()

print(coef)

array([1.72158862, 3.28338167, 3.28004542, 6.04194548])

Furthermore, if you're not going to modify the returned 1-d array, I suggest you use numpy.ravel, since it doesn't make a copy of the array, but just return a view of the array, which is much faster than numpy.flatten

min2bro
  • 4,509
  • 5
  • 29
  • 55
  • While I agree this is a better solution than mine and should probably be accepted, worth noting that the performance differential is marginal (copying an array is cheap). You will be feeding the array into `pd.DataFrame` which means you'll always need a copy. For 4mio items, I see 2.88s vs 2.75s performance. – jpp Jul 04 '18 at 09:14
3

You can use NumPy's flatten method to extract a one-dimensional array from your list of multi-dimensional arrays. For example:

coef = [np.array([[1.72158862]]),
        np.array([[3.28338167]]),
        np.array([[3.28004542]]),
        np.array([[6.04194548]])]

coef = np.array(coef).flatten()

print(coef)

array([1.72158862, 3.28338167, 3.28004542, 6.04194548])

Since NumPy arrays underly Pandas dataframes, you will find your Pandas coef series will now be of dtype float and contain only scalars.

jpp
  • 159,742
  • 34
  • 281
  • 339