6
import pandas as pd

I have a dataframe:

df=pd.DataFrame({'cmplxnumbers':[1+1j,2-2j,3*(1+1j)]})

I need to get the imaginary parts of the numbers in the column.

I do it by:

df.cmplxnumbers.apply(lambda number: number.imag)

I get as a result:

0    1.0
1   -2.0
2    3.0
Name: cmplxnumbers, dtype: float64

Which is as expected.

Is there any quicker, more straightforward method, perhaps not involving the lambda function?

zabop
  • 6,750
  • 3
  • 39
  • 84
  • 1
    Do you have some speed problem to report? This is pretty straightforward, as solutions go. You need to apply the `.imag` qualifier (essentially a `get` function call) to each element of the column. The `lambda` constructor is applied only once; the resulting function lives until the entire column is handled. – Prune Jan 20 '21 at 16:46
  • I don't have a speed problem in particular, I didn't find a better answer to the question than the one I presented, and it seemed odd that there isn't an easier way of doing it. – zabop Jan 20 '21 at 17:00

2 Answers2

6

Pandas DataFrame/Series builds on top of numpy array, so they can be passed to most numpy functions.

In this case, you can try the following, which should be faster than the non-vectorized .apply:

df['imag'] = np.imag(df.cmplxnumbers)
df['real'] = np.real(df.cmplxnumbers)

Output:

         cmplxnumbers  imag  real
0  1.000000+1.000000j   1.0   1.0
1  2.000000-2.000000j  -2.0   2.0
2  3.000000+3.000000j   3.0   3.0

Or you can do agg:

df[['real','imag']] = df.cmplxnumbers.agg([np.real, np.imag])
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • It doesn't seem to work for me using pandas v1.4.1: AttributeError: 'DataFrame' object has no attribute 'cmplxnumbers' – nicrie Aug 25 '22 at 12:03
  • 1
    @nicrie 'cmplxnumbers' here is the column name, you can replace `df.cmplxnumbers` with the actual column in your data, e.g. `df['col1']`. – Quang Hoang Aug 25 '22 at 12:13
1

Use the .to_numpy() method of a Pandas object to get a NumPy array, which then has the real and imag fields. In summary:

real_part = pd_obj.to_numpy().real
imag_part = pd_obj.to_numpy().imag

See here a discussion about .values vs .array vs .to_numpy().

Dev-iL
  • 23,742
  • 7
  • 57
  • 99
  • Nice. [The docs](https://pandas.pydata.org/docs/reference/api/pandas.Series.values.html) say "We recommend using Series.array or Series.to_numpy(), depending on whether you need a reference to the underlying data or a NumPy array", so I think `to_numpy()` is probably better. – zabop Mar 09 '23 at 08:50
  • 1
    Thanks, I've updated the answer. Also, note that a `DataFrame` doesn't have an `.array` field. – Dev-iL Mar 09 '23 at 10:13