I feel that the solution to this problem is really simple, but I am unable to figure it out.
So, I have a Pandas DataFrame, a screenshot of which is as below:
Column names do not matter, so I just excluded them from the pic. What is important though, is that I have a list of values in the first column. Now, if you look at the lower part of the image, it has a list with the values: [Bolivia , Plurinational State of)]
. I am trying to take the first value from that list i.e. Bolivia
and save it in the same row instead of that list. If I use something like energy["Country"][0] (The name of the concerned column is "Country"), I can extract the value, and it will also extract the required values from other rows, since there is only one value in that list. But for some reason I get an error.
Here is what I tried:
import numpy as np
import pandas as pd
def answer_one():
energy = pd.read_excel('Energy Indicators.xls',
sheet_name='Energy',
skiprows=[10,11,12,13,14,15,16,17],
skipfooter=38,
header=9,
parse_cols=[2,3,4,5], na_values = "...")
energy.columns = ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
energy["Energy Supply"] = energy["Energy Supply"].mul(1000000)
energy["Country"] = energy["Country"].str.split("(")[0]
return energy
answer_one()
It is the trailing [0]
at the end of energy["Country"] = energy["Country"].str.split("(")
which is making the trouble. The error I get is as follows:
ValueError: Length of values does not match length of index
Is there a way around this?
Also, a little another query: Is there a way to expand the width of the second column so that values like 1.430000e+08 could be displayed in their natural form?
(The full file "Energy Indicator.xls" can be found here)