I have a dataset that has a variable, NAICS Industry, represented by a 6 digit #, I want to get this # narrowed down to the first two digits, so I can combine industries for a broader view. After I get the industry # narrowed down to two digits instead of 6; I want to use value counts to count the total # of loans that fall within that NAICS industry code. Can someone please help. I have attached pictures for reference.
-
A clarification question: will a NAICS code in your dataset ever have less than six digits, and/or leading zero digits? – nanofarad Aug 21 '22 at 15:59
-
@nanofarad no; NAICS codes will have 6 digits in my dataset. – DopeNAnalytical Aug 21 '22 at 16:11
-
"I have attached pictures for reference." [Please do not do this for simple text data](https://meta.stackoverflow.com/questions/285551). – Karl Knechtel Aug 24 '22 at 00:48
1 Answers
The best approach depends on the data type of the NAICS data (which I can't tell from the screenshot alone) and assumptions about the number of digits.
Assuming that the dataset contains only six-digit NAICS codes in integer format (that is, df['NAICS'].dtype
is int64
or similar), the first two digits can be obtained by dividing the NAICS code by 10000 using integer division:
df['NAICS_sector'] = df['NAICS'] // 10000
Note that you must use //
(integer division) and not /
(floating-point division).
If the NAICS codes are in the dataframe in string format (that is, df['NAICS'].dtype
says object
), you can use string manipulation instead:
df['NAICS_sector'] = df['NAICS'].str.slice(stop=2)
Setting stop=2
means that the first two characters are returned from each entry. The parameters of the slice
method are explained in the official Pandas documentation.
Finally, if your dataset contains integers but you cannot guarantee they all have the same length, you'll want to use string manipulation anyway, by converting the column to a string and then using the second sample.
After all this is done, you can group using the new NAICS_sector
column.

- 40,330
- 4
- 86
- 117
-
1Thank you so much; I hope to be just as great as you one day!! You solved this frustrating problem for me in no time; I appreciate it. Have a great day! :) Btw, the data type was int64, so the first solution worked great. Thanks again. – DopeNAnalytical Aug 21 '22 at 17:08