0

Here I have a dataframe and an array, please be aware that this dataframe has only 1 column, "ds" and "code" are indices. My purpose is to get rid of all the stocks in this datafram that are not included in the array. The provided link actually does not help Suppose this dataframe names df

                    dividendyield
ds       code                    
20200601 000001.SZ           1.64
         000002.SZ           3.96
         000004.SZ           0.00
         000005.SZ           0.00
         000006.SZ           3.68
...                           ...
         688516.SH           0.00
         688566.SH           0.00
         688588.SH           0.40
         688598.SH           0.00

[3837 rows x 1 columns]

And I have an array like this, which consists of 2000 stock codes, suppose this array names "stk_code"

['000001.SZ' '000002.SZ' '000004.SZ' ... '603992.SH' '603993.SH'
 '603997.SH']

When I use

df = df.reindex(stk_code)

It returns

TypeError: Expected tuple, got str

Can anyone helps me with this error, thanks a lot!

Cooper
  • 73
  • 6
  • Sorry, I submitted by mistake. Are you using a multi-index? How about a different approach: Create a filter expression then use .loc[] to filter. `df = pd.DataFrame(data={'ds':'a a a a b c d e f f f f'.split()})` `keep = ['a','f']` `filt = [True if c in keep else False for c in df['ds'].values]` then `df.loc[filt,:]` <-- works to return as requested. – Paul Wilson Jan 28 '21 at 01:59
  • I am not using multi-indexing. The dataframe is obtained from a dataset, and I just want to reindex the dataframe with that array – Cooper Jan 28 '21 at 02:06
  • The dataframe shown in the example, is infact a multiindex dataframe. Note: `[3837 rows x 1 columns]` says 1 column, which means the others are multiindex. – Trenton McKinney Jan 28 '21 at 02:13
  • Indeed. I suspect this answer will be of use to you. You may need to use regex to clean the 'ds' index. https://stackoverflow.com/questions/53022580/handling-error-typeerror-expected-tuple-got-str-loading-a-csv-to-pandas-mult – Paul Wilson Jan 28 '21 at 02:16
  • To be honest, I did not find useful information from the link you provided – Cooper Jan 28 '21 at 02:23
  • `idx = pd.IndexSlice` and `df.loc[idx[:, ['stock1', 'stock2']], :]` from `question 1b` of the accepted answer works just fine to answer your question. – Trenton McKinney Jan 28 '21 at 02:41

0 Answers0