First of all, this is not a duplicate! I have searched in several SO questions as well as the Pandas doc, and I have not found anything conclusive!To create a new column with a row value, like this and this!
Imagine I have the following table, opening an .xls
and I create a dataframe with it. As this is a small example created from the real proble, I created this simple Excel table which can be easily reproduceable:
What I want now is to find the row that has "Population Month Year"
(I will be looking at different .xls
, so the structure is the same: population, month and year.
xls='population_example.xls'
sheet_name='Sheet1'
df = pd.read_excel(xls, sheet_name=sheet_name, header=0, skiprows=2)
df
What I thought is:
Get the value of that row with
startswith
Create a column, pythoning that value and getting the month and year value.
I have tried several things similar to this:
dff=df[s.str.startswith('Population')]
dff
But errors won't stop coming. In this above's code error, specifically:
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
I have several guesses:
- I am not understanding properly how
Series
in pandas work, even though reading the doc. I did not even think on using them, but thestartswith
looks like the thing I am looking for. - If I handle this properly, I might have a
NaN error
, but I cannot usedf.dropna()
yet, as I would lose that row value (Population April 2017
)!
Edit:
The problem on using this:
df[df['Area'].str.startswith('Population')]
Is that it will check the na values
.
And this:
df['Area'].str.startswith('Population')
Will give me a true/false/na set of values, which I am not sure how I can use.