Taking logarithm of column

Question

Im quite new to programming (in python) and I would like to create a new variable that is the logarithm of a column (from an imported excel file). I have tried different solutions from this site, but I keep getting an error. My latest error is AttributeError: 'str' object has no attribute 'log'. I have already dropped all the values that are not "numbers', but I still don't know how to convert the values from strings to integers (if this is the case, because 'int(neighborhood)' doesn't work).

This is the code I have now:

import pandas as pd
import numpy as np

df=pd.read_excel("kwb-2016_del_col_del_row.xls")
df = df[df.m_woz != "."] # drop rows with values "."
neighborhood=df[df.recs=="Neighborhood"]
neighborhood=neighborhood["m_woz"]
print(neighborhood)

np.log(neighborhood)

and this is the error I'm getting:

AttributeError                            Traceback (most recent call last)
<ipython-input-66-46698de51811> in <module>()
     12 print(neighborhood)
     13 
---> 14 np.log(neighborhood)


AttributeError: 'str' object has no attribute 'log'

Could someone help me please?

But the way to solve it is two lines below that in comment `y=np.log(buurt["g_woz"])`. — Willem Van Onsem, Nov 10 '17 at 18:46
Thanks for the fast reply! That line gives an error as well (key error). If I remove the key from that line, I get the same error as I described above — Kate, Nov 10 '17 at 18:49
Then the column has another name than `g_woz`. Please do not just throw away parts that error, this is usually not a good way to debug code. — Willem Van Onsem, Nov 10 '17 at 18:49
I didn't know what to do with that error because I'm fairly sure that that is the name of the column — Kate, Nov 10 '17 at 18:51
@WillemVanOnsem I can't copy the whole error, but this is the beginning of the error: 'TypeError: an integer is required During handling of the above exception, another exception occurred: KeyError Traceback (most recent call last)' — Kate, Nov 10 '17 at 18:59
Please reset your kernel, it seems you've assigned `np` to something else along the way. — cs95, Nov 10 '17 at 18:59
What is the relationship between `buurt` and `neighborhood`? Also, it's a good idea to use `.loc` when indexing. — andrew_reece, Nov 10 '17 at 19:02
Excuse me, I changed it to English for this site, but I unfortunately forgot to change it in the error. Sorry for that. In the error buurt=neighborhood — Kate, Nov 10 '17 at 19:05
@Kate It seems you've glossed over my comment. Did you restart the kernel or not? — cs95, Nov 10 '17 at 19:10
@cᴏʟᴅsᴘᴇᴇᴅ yes, I did. twice, but still the same error — Kate, Nov 10 '17 at 19:12
@cᴏʟᴅsᴘᴇᴇᴅ Did this, and I still get the same error — Kate, Nov 10 '17 at 19:53

akubot · Answer 1 · 2017-11-11T04:40:34.837

Perhaps you are not removing the data you think you are?
Try printing the data types to see what they are.
In a DataFrame, your column might be filled with objects instead of numbers.

print(df.dtypes)

Also, you might want to look at these two pages

Select row from a DataFrame based on the type of the object(i.e. str)

Pandas: convert dtype 'object' to int

Here's an example I constructed and ran interactively that correctly gets the logarithms (don't type >>>):

>>> raw_data = {'m_woz': ['abc', 'def', 1.23, 45.6, '.xyz'], 
    'recs': ['Neighborhood', 'Neighborhood', 
    'unknown', 'Neighborhood', 'whatever']}
>>> df = pd.DataFrame(raw_data, columns = ['m_woz', 'recs'])
>>> print(df.dtypes)
m_woz    object
recs     object
dtype: object

Note that the type is object, not float or int or str

Continuing on, here is what df and neighborhood look like:

>>> df
  m_woz          recs
0    42  Neighborhood
1   def  Neighborhood
2  1.23       unknown
3  45.6  Neighborhood
4  .xyz      whatever

>>> neighborhood=df[df.recs=="Neighborhood"]
>>> neighborhood

  m_woz          recs
0    42  Neighborhood
1   def  Neighborhood
3  45.6  Neighborhood

And here are the tricks... This line selects all rows in neighborhood that are int or float (be careful to fix indents if you copy/paste this

>>> df_num_strings = neighborhood[neighborhood['m_woz'].
        apply(lambda x: type(x) in (int, float))]

>>> df_num_strings
  m_woz          recs
0    42  Neighborhood
3  45.6  Neighborhood

Almost there... convert the numbers to floating point from string

>>> df_float = df_num_strings['m_woz'].astype(str).astype(float)
>>> df_float
0    42.0
3    45.6

Finally, compute logarithms:

>>> np.log(df_float)
0    3.737670
3    3.819908
Name: m_woz, dtype: float64

This worked! Except the part of the apply function, as all the values were not seen as floats or integers, but the 'astype' functions worked. I don't know why, but I have tried that one before and it didn't work then. Maybe I changed the code after I tried it. Thanks so much for your effort @coldspeed, WillemVanOnsem and of course akubot! — Kate, Nov 11 '17 at 11:35
@Kate great news, please mark the answer as correct if you don't mind — akubot, Nov 11 '17 at 14:11

Taking logarithm of column

1 Answers1