7

I have a time series with price information in column price. When I tried to create a new column ln_price by taking the ln of column price I got an error:

AttributeError: 'float' object has no attribute 'log'

Can someone help me understand why this would be and how it can be fixed?

Thanks!

df['ln_price'] = np.log(df['price'])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
Tony
  • 221
  • 1
  • 4
  • 11
  • Are you sure this is all the relevant code? – Finomnis Jul 02 '19 at 21:06
  • 1
    You have a float variable `np` in scope. – Andy Hayden Jul 02 '19 at 21:07
  • See https://stackoverflow.com/questions/47208473/attributeerror-numpy-float64-object-has-no-attribute-log10/47208873#47208873 – Warren Weckesser Jul 02 '19 at 21:08
  • @AndyHayden Same thought. – Finomnis Jul 02 '19 at 21:10
  • 1
    @Tony, your question is not complete. It will be much easier for someone to help you if you provide a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) that anyone can copy and run to reproduce the problem. – Warren Weckesser Jul 02 '19 at 21:22
  • @WarrenWeckesser Thank you, absolutely. Like, what is the variable `df`? Because that would be valuable information. – Finomnis Jul 02 '19 at 21:22
  • Like @WarrenWeckesser, I suspect the `dtype` of `df['price']` is `object. `numpy` functions like this can't operate directly on object arrays. Instead they try to delegate the action to a corresponding method of the objects - hence this error. – hpaulj Jul 02 '19 at 22:03
  • I took the liberty of adding the `pandas` tag. – hpaulj Jul 02 '19 at 23:21

2 Answers2

9

As pointed out by warren-weckesser this can also happen if you use dtype object (and in fact this is likelier the issue you are facing):

>>> s = pd.Series([1.0], dtype='object')
>>> s
0    1
dtype: object
>>> np.log(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'float' object has no attribute 'log'

You can address this by setting the dtype to float explicitly:

>>> np.log(s.astype('float64'))
0    0.0
dtype: float64

In your case:

np.log(df['price'].astype('float'))

Note: You can have more control using to_numeric.


First/alternative answer:

You have a float variable np in scope.

The problem is that:

import numpy as np
np = 1
np.log

is perfectly valid python.

>>> import numpy as np
>>> np = 1.
>>> np.log
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'float' object has no attribute 'log'

The solution is not to use np are a variable name, or other popular import abbreviations pd or dt etc. You can pick this kind of error up using a linter.

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • This is a possible source of the error, but not the only one. The error also occurs if `df['price'].dtype` is `dtype('O')` (i.e. the data type is `object` instead of `float64`). This is not unusual when working with Pandas. – Warren Weckesser Jul 02 '19 at 21:16
  • @WarrenWeckesser Well, either way, we can 100% say that the problem is not part of the code you posted. Because your code works. – Finomnis Jul 02 '19 at 21:17
  • @Finomnis, I didn't post any code in this question. What code are you talking about? – Warren Weckesser Jul 02 '19 at 21:18
  • `df['ln_price'] = np.log(df['price'])` – Finomnis Jul 02 '19 at 21:19
  • You assumed that `df` is a Python dictionary with a `float` value. I can create an example that reproduces the error when `df` is a Pandas DataFrame. See my answer to the question I linked to in a comment on the question. – Warren Weckesser Jul 02 '19 at 21:20
  • @WarrenWeckesser `AttributeError: 'float' object has no attribute 'log'`, the code you cide would not give that error. – Andy Hayden Jul 02 '19 at 21:21
  • @WarrenWeckesser "You assumed that df is a Python dictionary with a float value", no. That specific error can only happen at `.log`... – Andy Hayden Jul 02 '19 at 21:23
  • @AndyHayden: `np.log(pd.Series(np.random.rand(10), dtype=object)) `, where `pd` is `pandas`. – Warren Weckesser Jul 02 '19 at 21:24
  • Even simpler: `np.log(pd.Series([1.0, 'NA', 3.0])) ` – Warren Weckesser Jul 02 '19 at 21:30
  • @WarrenWeckesser ooh, interesting. So it's to do with the dtype object, that's really weird. – Andy Hayden Jul 02 '19 at 21:55
  • @Finomnis looks like warren was correct: it's about the dtype! (I updated this answer to have some info), really weird. – Andy Hayden Jul 02 '19 at 22:01
  • Oh. What? That is super weird, but I think I don't know enough about pandas. I still think if OP used pandas, he should have mentioned it *somewhere*. – Finomnis Jul 02 '19 at 22:13
0

The problem is outside of the code that you posted. Your code works. At least if I assume that df is a dict. But I cannot assume anything else, because your question does not specify it.

import numpy as np

df = {'price': 10.0}
df['ln_price'] = np.log(df['price'])

print(df)
{'price': 10.0, 'ln_price': 2.3025850929940459}
Finomnis
  • 18,094
  • 1
  • 20
  • 27
  • 1
    `df` is a common variable name for a `pandas.DataFrame`. Its indexing can look a lot like a dict's. – hpaulj Jul 02 '19 at 22:25