0

(This is a slightly different version of a question I asked earlier)

I have a dataframe that looks like this:

              INC_KEY               PTOCCUPATIONALINDUSTRY
96       170000016620                       Other Services
127      170000016651                       Other Services
170      170000016694                        Manufacturing
181      170000016706                         Construction
268      170000016793                       Other Services

I also have a CSV file, which I plan to turn into a dataframe, that looks like this (assume it is a dataframe):

CSV

My task is to convert the values in PTOCCUPATIONALINDUSTRY to the numbers that you see in the dictionary. So the output should look like this:

              INC_KEY               PTOCCUPATIONALINDUSTRY
96       170000016620                                   14
127      170000016651                                   14
170      170000016694                                    2
181      170000016706                                    8
268      170000016793                                   14

What is the easiest way to do this without manually doing if statements for each value? (I'm a newbie btw).

  • Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – Chris Sep 18 '21 at 16:36
  • Not exactly - I need to replace the values, not create a new column. thank you, though. Unless there is some function of merge that I am not aware of. – Sean Roudnitsky Sep 18 '21 at 16:39
  • Maybe you could try merging, then renaming some columns :P – Chris Sep 18 '21 at 16:40
  • That would probably work fine, but I need to perform this operation over a large number of columns, so it would be extremely tedious. – Sean Roudnitsky Sep 18 '21 at 17:00

2 Answers2

1

Once you import your csv file in a DF, you can create a dictionary with keys being your label column, and your values being the start column, as below:

j = dict(zip(df_csv['Label'],df_csv['Start']))

>>> j

{'Finance, Insurance and Real Estate': 1,
 'Natoural Resources and Mining': 10,
 'Other Services': 14,
 'Construction': 2,
 'Government': 3}

Then, you can rewrite over your PTOCCUPATIONALINDUSTRY column, with the values from your dictionary using map:

df['PTOCCUPATIONALINDUSTRY']=df['PTOCCUPATIONALINDUSTRY'].map(j)

>>> df

        INC_KEY  PTOCCUPATIONALINDUSTRY
0  1.700000e+11                    14.0
1  1.700000e+11                    14.0
2  1.700000e+11                     NaN
3  1.700000e+11                     2.0
4  1.700000e+11                    14.0
sophocles
  • 13,593
  • 3
  • 14
  • 33
0

First, convert the Start and Label columns of your dataframe to a dictionary.

import pandas as pd

df = pd.DataFrame({'FmtName': ['PtOccupation', 'PtOccupation', 'PtOccupation'],
                   'Start': [1, 10, 11],
                   'Label': ['Finance...', 'Natural...', 'Info...']})

label_start_dict = pd.Series(df['Start'].values, index=df['Label']).to_dict()
print(label_start_dict)

Output:

{'Finance...': 1, 'Natural...': 10, 'Info...': 11}

After that, you can use map to replace values in the other dataframe, as in the answer to your previous question.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880