-1

I'm working with the Bureau of Labor Statistics data which looks like this:

series_id           year    period         value
CES0000000001       2006    M01            135446.0

series_id[3][4] indicate the supersector. for example, CES10xxxxxx01 would be Mining & Logging. There are 15 supersectors that I'm concerned with and hence I want to create 15 separate data frames for each supersector to perform time series analysis. So I'm trying to access each value as a list to achieve something like:

# *psuedocode*:
mining_and_logging = df[df.series_id[3]==1 and df.series_id[4]==0]

Can I avoid writing a for loop where I convert each value to a list then access by index and add the row to the new dataframe?

How can I achieve this?

3venthoriz0n
  • 117
  • 2
  • 12
  • Let me clarify. You're trying to take the two numbers that come afte `CES` and split your dataframe in 15 different dataframes according to those codes? – Juan C Aug 06 '19 at 19:40
  • 1
    Please take a look at [How to create good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and provide a more robust sample input, and your preferred sample output. You also may find pandas [series.str.slice](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.slice.html) helpful – G. Anderson Aug 06 '19 at 19:42
  • @JuanC Yes that's exactly what I'm trying to do. – 3venthoriz0n Aug 06 '19 at 19:46
  • Thanks @G.Anderson ! series.str.slice was helpful! – 3venthoriz0n Aug 07 '19 at 00:07

2 Answers2

0

One way to do what you want and recursively store the dataframes through a for loop could be:

First, create an auxiliary column to make your life easier:

df['id'] = df['series_id'][3:5] #Exctract characters 3 and 4 of every string (counting from zero)

Then, you create an empty dictionary and populate it:

dict_df = {}
for unique_id in df.id.unique():
    dict_df[unique_id] = df[df.id == unique_id]

Now you'll have a dictionary with 15 dataframes inside. For example, if you want to call the dataframe associated with id = 01, you just do:

dict_df['01']

Hope it helps !

Juan C
  • 5,846
  • 2
  • 17
  • 51
  • Thanks, the first part doesn't work as df['series_id][3:5] selcts the 3rd and 4th rows. but the second part worked! – 3venthoriz0n Aug 07 '19 at 00:06
0

Solved it by combining answers from Juan C and G. Anderson.

Select the 3rd and 4th character:

    df['id'] = df.series_id.str.slice(start=3, stop=5)

And then the following to create dataframes:

    dict_df = {}
    for unique_id in df.id.unique():
        dict_df[unique_id] = df[df.id == unique_id]
3venthoriz0n
  • 117
  • 2
  • 12