I have a dataframe that contains 5 columns and I am using pandas and numpy to edit and work with the data.
id calv1 calv2 calv3 calv4
1 2006-08-29 2007-08-29 2008-08-29 2009-08-29
2 NaT NaT NaT NaT
3 2006-08-29 NaT NaT NaT
4 2006-08-29 2007-08-29 2010-08-29 NaT
5 2006-08-29 2013-08-29 NaT NaT
6 2006-08-29 NaT 2013-08-29 2013-08-292
I want to create another column that counts the number of "calv" that occur for each id. However it matters to me if there are missing values inbetween other values, see row 6. Then I want there to be a NaN or perhaps some other value indicating this is not a correct row.
id calv1 calv2 calv3 calv4 no_calv
1 2006-08-29 2007-08-29 2008-08-29 2009-08-29 4
2 NaT NaT NaT NaT 0
3 2006-08-29 NaT NaT NaT 1
4 2006-08-29 2007-08-29 2010-08-29 NaT 3
5 2006-08-29 2013-08-29 NaT NaT 2
6 2006-08-29 NaT 2013-08-29 2013-08-292 NaN #or some other value
Here is my last attempt:
nat = np.datetime64('NaT')
df.loc[
(df["calv1"] == nat) & (df["calv2"] == nat) &
(df["calv3"] == nat) & (df["calv4"] == nat),
"no_calv"] = 0
#1 calvings
df.loc[
(df["calv1"] != nat) & (df["calv2"] == nat) &
(df["calv3"] == nat) & (df["calv4"] == nat),
"no_calv"] = 1
#2 calvings
df.loc[
(df["calv1"] != nat) & (df["calv2"] != nat) &
(df["calv3"] == nat) & (df["calv4"] == nat),
"no_calv"] = 2
#3 calvings
df.loc[
(df["calv1"] != nat) & (df["calv2"] != nat) &
(df["calv3"] != nat) & (df["calv4"] == nat),
"no_calv"] = 3
#4 or more calvings
df.loc[
(df["calv1"] != nat) & (df["calv2"] != nat) &
(df["calv3"] != nat) & (df["calv4"] != nat),
"no_calv"] = 4
But the result is that the whole "no_calv" column is 4.0
I previously tried things like
..
(df["calv1"] != "NaT")
..
And
..
(df["calv1"] != pd.nat)
..
And the result was always 4.0 for the whole column or just NaN. I can't seem to find a way of telling python what the NaT values are?
Any tips and tricks for a new python user? I've done this both in SAS and in Fortran using if and elseif statements but I am trying to find the best way to do this in Python.
Edit: I'm really curious to know if this can be done by if or ifelse statements.
And now I'm also thinking I would like to be able to have other columns in the dataframe that contain extra info but are not needed for this exact purpose. An example (an added yx column):
id yx calv1 calv2 calv3 calv4 no_calv
1 27 2006-08-29 2007-08-29 2008-08-29 2009-08-29 4
2 34 NaT NaT NaT NaT 0
3 89 2006-08-29 NaT NaT NaT 1
4 23 2006-08-29 2007-08-29 2010-08-29 NaT 3
5 11 2006-08-29 2013-08-29 NaT NaT 2
6 43 2006-08-29 NaT 2013-08-29 2013-08-292 NaN #or some other value