2

Simplified situation:

I've got a file with list of some countries and I load it to dataframe df. Then I've got data concerning those countries (and many more) in many .xls files. I try to read each of those files to df_f, subset the data I'm interested in and then find countries from the original file and if any of them is present, copy the data to dataframe df.

The problem is that only some of the values are assigned correctly. Most of them are inserted as NaNs. (see below)

for filename in os.listdir(os.getcwd()):
    df_f = pd.read_excel(filename, sheetname = 'Data', parse_cols = "D,F,H,J:BS", skiprows = 2, skip_footer = 2)
    df_f = df_f.fillna(0)

    df_ss = [SUBSETTING df_f here]

    countries = df_ss['Country']

    for c in countries:
        if (c in df['Country'].values):
            row_idx = df[df['Country'] == c].index

            df_h = df_ss[quarters][df_ss.Country == c]
            df.loc[row_idx, quarters] = df_h

The result I get is:

Country  Q1 2000  Q2 2000  Q3 2000  Q4 2000  Q1 2001  Q2 2001  Q3 2001  \
0     Albania      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
1     Algeria      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2   Argentina      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
3     Armenia      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
4   Australia      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
5     Austria  4547431  5155839  5558963  6079089  6326217  6483130  6547780   
6  Azerbaijan      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
etc...

The loading and subsetting is done correctly, data is not corrupted - I print df_h for each iteration and it shows regular numbers. The point is that after assigning them to df dataframe they become NaNs...

Any idea?

EDIT: sample data

df:

          Country Country group  Population  Development coefficient  Q1 2000  \
0     Albania      group II     2981000                       -1        0   
1     Algeria       group I    39106000                       -1        0   
2   Argentina     group III    42669000                       -1        0   
3     Armenia      group II     3013000                       -1        0   
4   Australia      group IV    23520000                       -1        0   
5     Austria      group IV     8531000                       -1        0   
6  Azerbaijan      group II     9538000                       -1        0   
7  Bangladesh       group I   158513000                       -1        0   
8     Belarus     group III     9470000                       -1        0   
9     Belgium     group III    11200000                       -1        0   

 (...)

   Q2 2013  Q3 2013  Q4 2013  Q1 2014  Q2 2014  Q3 2014  Q4 2014  Q1 2015  
0        0        0        0        0        0        0        0        0  
1        0        0        0        0        0        0        0        0  
2        0        0        0        0        0        0        0        0  
3        0        0        0        0        0        0        0        0  
4        0        0        0        0        0        0        0        0  
5        0        0        0        0        0        0        0        0  
6        0        0        0        0        0        0        0        0  
7        0        0        0        0        0        0        0        0  
8        0        0        0        0        0        0        0        0  
9        0        0        0        0        0        0        0        0

and df_ss of one of files:

    Country  Q1 2000  Q2 2000  Q3 2000  Q4 2000  Q1 2001  \
5                       Guam    11257    17155    23063    29150    37098   
10                  Kiribati      323      342      361      380      398   
15          Marshall Islands      425      428      433      440      449   
17                Micronesia        0        0        0        0        0   
19                     Nauru        0        0        0        0        0   
22  Northern Mariana Islands     2560     3386     4499     6000     8037   
27                     Palau     1513     1672     1828     1980     2130   

(...) 

    Q3 2013  Q4 2013  Q1 2014  Q2 2014  Q3 2014  Q4 2014  Q1 2015  
5    150028   151152   152244   153283   154310   155333   156341  
10    19933    20315    20678    21010    21329    21637    21932  
15    17536    19160    20827    22508    24253    26057    27904  
17    18646    17939    17513    17232    17150    17233    17438  
19     7894     8061     8227     8388     8550     8712     8874  
22    27915    28198    28481    28753    29028    29304    29578  
27    17602    17858    18105    18337    18564    18785    19001  
RedFidd
  • 53
  • 4
  • It looks like all you're doing is just filling/adding/merging the quarters data, why not just do a merge `df.merge(df_h, on='Country')`? – EdChum Jun 23 '15 at 08:09
  • @EdChum I tried that, but merging adds more of those Q1 2000, Q2 2000 ... columns instead of fitting data into those already existing – RedFidd Jun 23 '15 at 08:15
  • I don't understand what you're saying, but basically you should be able to merge the data, can you post raw input data for both dfs and the desired output – EdChum Jun 23 '15 at 08:17
  • @EdChum - after merging I get columns: `Country Q1 2001_x Q2 2001_x ... Q1 2001_y Q2 2002_y ...`. Values from df in x columns, values from df_h in y columns. Each iteration adds another pair of x-y columns – RedFidd Jun 23 '15 at 08:27
  • This means that the columns clash can you try `df_h.combine_first(df)`, I say again post sample data as this will become tedious trying to debug for you – EdChum Jun 23 '15 at 08:28
  • 1
    Can you try `df.loc[row_idx, quarters] = df_h.values` for the last line (note the extra `.values` at the end)? But I agree with the above that there is probably a better solution, and it would help to get a reproducible example. – joris Jun 23 '15 at 08:44
  • @joris this worked!! thanks – RedFidd Jun 23 '15 at 08:52

2 Answers2

0

Try setting the values like the following (see this post):

df.ix[quaters,...] = 10
Community
  • 1
  • 1
jhoepken
  • 1,842
  • 3
  • 17
  • 24
  • Nothing changed - result is the same (values for the same countries were assigned while others got NaNs) – RedFidd Jun 23 '15 at 08:23
  • Maybe you could give an example, that can be copy-pasted easily, without the access to the Excel data sheet. – jhoepken Jun 23 '15 at 08:34
0

By @joris:

Can you try df.loc[row_idx, quarters] = df_h.values for the last line (note the extra .values at the end)?

This one worked, thanks :-)

RedFidd
  • 53
  • 4