2

so the problem I am stuck with is, I am unsure of how to call a dictionary inside a function.

>>> play = pd.DataFrame(play)
>>> play
      No      Yes
0  Wednesday   Monday
1   Thursday  Tuesday
2   Saturday   Friday
>>> days = 'Monday','Tuesday','Wednesday','Thursday','Friday','Saturday'

>>> newdict = {}
>>> numbincrease = 0
>>> for i in days:
        newdict[i] = numbincrease
        numbincrease = numbincrease + 1


>>> print(newdict)
{'Monday': 0, 'Tuesday': 1, 'Wednesday': 2, 'Thursday': 3, 'Friday': 4, 'Saturday': 5}

So I want to convert the days to numbers, the output should be able to iter over all the columns and all the rows and read the dictionary and replace with the appropriate value from the dictionary used in the program. The output should look like,

>>> play
      No      Yes
0     2        0
1     3        1
2     5        4  

I have searched for many ways to do this, but it doesn't seem to work. I have no idea of how to call a function to iterate over each column and apply call dictionary to that value and continue with next row or column. Please help, Thank you

jpp
  • 159,742
  • 34
  • 281
  • 339
Tia
  • 521
  • 2
  • 6
  • 18

3 Answers3

4

pd.DataFrame.applymap can be used to apply a function to every value in a dataframe. In this case, the appropriate function is dict.get.

Note, as below, you can efficiently map days to integers using a dictionary comprehension with enumerate.

days = ('Monday','Tuesday','Wednesday','Thursday','Friday','Saturday')

daymap = {v: k for k, v in enumerate(days)}

res = df.applymap(daymap.get)

print(res)

   No  Yes
0   2    0
1   3    1
2   5    4
jpp
  • 159,742
  • 34
  • 281
  • 339
1

Try this, .applymap iterates through every item in a dataframe
play.applymap(lambda x: newdict[x])


Edit: with advice from @jpp:
play.applymap(newdict.get)

Dillon
  • 997
  • 4
  • 13
  • In my opinion, you should never use `play.applymap(lambda x: newdict[x])`. This adds an additional and unnecessary Python-level function call. In addition, it will fail with `KeyError` if any value is not in your dictionary keys. – jpp Jun 22 '18 at 11:28
  • @jpp I want to understand what you mean by this. My thinking is that there are 2 steps: **1.** `lambda x:` and **2.** `newdict[x]` whereas `newdict.get` is just one step. Is that correct? _Side note, you've given me a lightbulb moment regarding the use of functions inside `applymap` (and `apply`/`map` etc) which makes the code faster and easier to understand. I have you to thank for this!_ – Dillon Jun 22 '18 at 11:35
  • Yes, you're correct. `apply` is just a *hidden loop*. It goes through each value and applies a function to it. `lambda x: newdict[x]` is effectively 2 function calls: one for `lambda` and one for getting a value from a key. Adding function calls can be expensive. – jpp Jun 22 '18 at 11:36
  • @Jpp so are you saying it is strictly better to perform `play.applymap(my_func)` alongside `def my_func(): return newdict[x]` compared to `play.applymap(lambda x: newdict[x])` because to me it looks like the same thing – Dillon Jun 22 '18 at 11:41
  • That's not what I said. I said: don't use 2 function calls. Both your examples have 2 function calls (one `lambda` + `dict.get`, the other `my_func` + `dict.get`). Just use `play.applymap(newdict.get)`. – jpp Jun 22 '18 at 11:43
  • 1
    @Jpp Oh I see. I now know what is meant by a function call, thank you! – Dillon Jun 22 '18 at 11:47
1

You can do so:

play = play.replace(newdict)

The difference between replace and applymap is that when an element in your df is not in the dictionary, with replace the element will stay as it is:

df:

         No       Yes
  Wednesday    Monday
   Thursday   Tuesday
      a day    Friday 

with replace:

      No    Yes
0      2      0
1      3      1
2  a day      4

With applymap you will have NaN:

    No  Yes
0  2.0    0
1  3.0    1
2  NaN    4 
Joe
  • 12,057
  • 5
  • 39
  • 55
  • Note the [performance implications](https://stackoverflow.com/questions/49259580/replace-values-in-a-pandas-series-via-dictionary-efficiently), `replace` is slow. You should, in most cases, prefer `df.applymap(foo).fillna(df)` to `df.replace(foo)`. Having said this, the issue of unmapped values may not be relevant with OP's data. – jpp Jun 21 '18 at 10:10
  • 1
    @jpp thanks for specifying. I just wanted to show a different option, but I agree that `applymap` is preferred. Anyway i like your answer because improves the code in general, +1 – Joe Jun 21 '18 at 10:12
  • @jpp So would you say that `newdict.get(x, np.nan)` is a faster solution if you want to handle unmapped values (instead of replace)? – Dillon Jun 22 '18 at 11:06
  • @Dillon, Exactly, you can just use `df.applymap(daymap.get)`. If something is unmapped, it'll become `None` or `NaN`, you don't need to specify the alternative value explicitly with Pandas. – jpp Jun 22 '18 at 11:08