How to access index of string value in a cell of pandas data frame?

Question

I'm working with the Bureau of Labor Statistics data which looks like this:

series_id           year    period         value
CES0000000001       2006    M01            135446.0

series_id[3][4] indicate the supersector. for example, CES10xxxxxx01 would be Mining & Logging. There are 15 supersectors that I'm concerned with and hence I want to create 15 separate data frames for each supersector to perform time series analysis. So I'm trying to access each value as a list to achieve something like:

# *psuedocode*:
mining_and_logging = df[df.series_id[3]==1 and df.series_id[4]==0]

Can I avoid writing a for loop where I convert each value to a list then access by index and add the row to the new dataframe?

How can I achieve this?

Let me clarify. You're trying to take the two numbers that come afte `CES` and split your dataframe in 15 different dataframes according to those codes? — Juan C, Aug 06 '19 at 19:40
Please take a look at [How to create good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and provide a more robust sample input, and your preferred sample output. You also may find pandas [series.str.slice](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.slice.html) helpful — G. Anderson, Aug 06 '19 at 19:42

score 0 · Answer 1 · answered Aug 06 '19 at 20:29

One way to do what you want and recursively store the dataframes through a for loop could be:

First, create an auxiliary column to make your life easier:

df['id'] = df['series_id'][3:5] #Exctract characters 3 and 4 of every string (counting from zero)

Then, you create an empty dictionary and populate it:

dict_df = {}
for unique_id in df.id.unique():
    dict_df[unique_id] = df[df.id == unique_id]

Now you'll have a dictionary with 15 dataframes inside. For example, if you want to call the dataframe associated with id = 01, you just do:

dict_df['01']

Hope it helps !

Thanks, the first part doesn't work as df['series_id][3:5] selcts the 3rd and 4th rows. but the second part worked! — 3venthoriz0n, Aug 07 '19 at 00:06

score 0 · Answer 2 · answered Aug 07 '19 at 00:11

Solved it by combining answers from Juan C and G. Anderson.

Select the 3rd and 4th character:

    df['id'] = df.series_id.str.slice(start=3, stop=5)

And then the following to create dataframes:

    dict_df = {}
    for unique_id in df.id.unique():
        dict_df[unique_id] = df[df.id == unique_id]

How to access index of string value in a cell of pandas data frame?

2 Answers2