0

I have trimmed strings within a column to isolate key words and create a dataframe (totalarea_cols) which I can then use to label headers of a second dataframe (totalarea_p).

However, it appears that keywords are created as tuples and when used to label columns in second dataframe, the tuples syntax is included (see sample below; totalarea_p.head())

Here is a sample of the code:

totalarea_code = df_meta_p2.loc[df_meta_p2['Label English'].str.contains('Total area under dry season '), 'Code'];
totalarea_cols = df_meta_p2['Label English'].str.extractall('Total area under dry season (.*)').reset_index(drop=True)
totalarea_p = df_data_p2.loc[: , totalarea_code];
totalarea_p.columns = totalarea_cols

Sample of metadata from which I would like to extract keyword from string:

In[33]: df_meta_p2['Label English']
Out[33]: 
0                                          District code
1                                          Province code
2                               Province name in English
3                               District name in English
4                                   Province name in Lao
5         Total area under dry season groundnut (peanut)
6      Total number of households growing dry season ...
7      Total number of households growing dry season ...
8      Total number of households growing dry season ...
9      Total number of households growing dry season ...
10     Total number of households growing dry season ...
11     Total number of households growing dry season yam
12     Total number of households growing dry season ...
13     Total number of households growing dry season ...
14     Total number of households growing dry season ...
15     Total number of households growing dry season ...
16     Total number of households growing dry season ...
17     Total number of households growing dry season ...
18     Total number of households growing dry season ...
19     Total number of households growing dry season ...

Name: Label English, dtype: object

Sample of DataFrame output using str.extractall:

In [34]: totalarea_cols
Out[34]: 
                                                   0
0                                 groundnut (peanut)
1                       lowland rice/irrigation rice
2                                        upland rice
3                                             potato
4                                       sweet potato
5                                            cassava
6                                                yam
7                                               taro
8                   other tuber, root and bulk crops
9                                          mungbeans
10                                            cowpea
11                                        sugar cane
12                                           soybean
13                                            sesame
14                                            cotton
15                                           tobacco
16                           vegetable not specified
17                                           cabbage

Sample of column headers when substitute into second DataFrame, totalarea_p:

In [36]:  totalarea_p.head()
Out[36]: 
   (groundnut (peanut),)  (lowland rice/irrigation rice,)  (upland rice,)  \
0                    0.0                             0.00               0   
1                    0.0                             0.00               0   
2                    0.0                             0.00               0   
3                    0.0                             0.30               0   
4                    0.0                             1.01               0   

   (potato,)  (sweet potato,)  (cassava,)  (yam,)  (taro,)  \
0        0.0             0.00         0.0     0.0        0   
1        0.0             0.00         0.0     0.0        0   
2        0.0             0.52         0.0     0.0        0   
3        0.0             0.01         0.0     0.0        0   
4        0.0             0.00         0.0     0.0        0   

I have spent the better part of a day searching for an answer but, other than the post found here, am coming up blank. Any ideas??

bugguts
  • 33
  • 1
  • 4
  • Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Jan 31 '18 at 09:36
  • Thank you. I have taken the time to read the suggestions that you have posted. I take it that you cannot understand what I mean? – bugguts Jan 31 '18 at 09:40
  • Hmm, if want convert only one columns of tuples to scalars, then use `df['col'] = df['col'].str[0]`, but without data hard to know if it need. – jezrael Jan 31 '18 at 09:41
  • Thank you. I think you need `totalarea_p.columns = totalarea_cols[0]` or `totalarea_p.columns = totalarea_cols['0']` – jezrael Jan 31 '18 at 12:59
  • @jezrael I have amended my question. Hopefully it is clearer now. – bugguts Jan 31 '18 at 12:59
  • Yes, please check comment above ;) – jezrael Jan 31 '18 at 13:00
  • What a champ! First suggestion worked a treat. Thanks so much! – bugguts Jan 31 '18 at 13:01

1 Answers1

0

You need select column 0 for Series, so change code to:

totalarea_p.columns = totalarea_cols[0]

Or select by position by iloc:

totalarea_p.columns = totalarea_cols.iloc[:, 0]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252