1

I have the following df data in pandas:

    import pandas
    data = pd.DataFrame({'Country Name': ['Indonesia', 'France'],
                        'Indicator': ['Literacy', 'Literacy'],
                        '2014': ['88.0', '98.0'],
                        '2015': ['89.0', '98.0'],
                        'Country Code': ['IDN', 'FRA'],
                        'Population': [80000000, 67000000],
                        'Income Group': ['Upper Middle', 'High, OED']})

I want to add rows for these two countries for a new indicator: 'Internet users'.

I know how to add the rows and fill the values for 'Country Name', 'Indicator', '2014', '2015'. See below:

for coun in ['Indonesia', 'France']:
    data = data.append({'Country Name': coun, 'Indicator': 'Internet Users', '2014': 95, '2015': 95, 'Country Code': np.NaN, 'Population': np.NaN , 'Income Group': np.NaN },
                        ignore_index=True)

But how can I fill automatically the last three columns (np.NaN in appended data) according the Country Name ? In excel I would have used a vlookup to easily get the data with the Country Name as a key, but I am clueless with Pandas.

From what I read here, I believe it can be done using, for each 'red' column, map() and a dict with 'Country Name' as key and the values of the column as the dict values, but I wondered whether there was a lighter solution.

code for the desired df:

desired_output = pd.DataFrame({'Country Name': ['Indonesia', 'France', 'Indonesia', 'France'],
                    'Indicator': ['Literacy', 'Literacy', 'Internet Users', 'Internet Users'],
                    '2014': [88.0, 98.0, 95.0, 95.0],
                    '2015': [89.0, 98.0, 95.0, 95.0],
                    'Country Code': ['IDN', 'FRA', 'IDN', 'FRA'],
                    'Population': [80000000, 67000000, 80000000, 67000000],
                    'Income Group': ['Upper Middle', 'High, OED', 'Upper Middle', 'High, OED']})

Thanks

EDIT: I was able to fill the NaN using this additional code:

pd.merge(data[['Country Name', 'Indicator', '2014', '2015']] , data[data.notna().all(1)][['Country Name', 'Country Code', 'Population', 'Income Group']], how='left', on='Country Name')

But if anyone has a cleaner solution, I am all ears

P.S: Is there a way to easily paste a tab from Jupyter or Excel in a post ? I had to paste the plain text and play with spacing to make something acceptable. If anyone has a tip :)

vpvinc
  • 155
  • 2
  • 10
  • Welcome to Stackoverflow and congratulations on your first post! Note that pictures of dataframes are strongly discouraged, although I do note that you also pasted them as text. But best is to provide code that generates a dataframe you have a question about, such as described here: [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). . It is also always appreciated if you showed what you have already tried, and what problems it gave you, as in here [mcve] and [here](http://stackoverflow.com/help/how-to-ask). – piterbarg Nov 26 '20 at 17:05
  • To your specific question about Jupyter tabs or Excel, the answer is no -- best is just code that others can copy-paste and run in their own environment to reproduce your problem or test their solutions on. – piterbarg Nov 26 '20 at 17:06
  • Thanks @piterbarg, I will adapt my post asap according to your recommendations. – vpvinc Nov 26 '20 at 17:14

0 Answers0