I have some input CSV from which I am building a second dataframe based on certain conditions. Part of the CSV looks like this:
Input
│ Date │ Instrument │ Value │
├──────────────────┼────────────┼─────────┤
│ 2018-09-01 00:05 │ Foo │ -0.3552 │
│ 2018-09-01 00:05 │ Bar │ -0.4423 │
│ 2018-09-01 00:05 │ Foo │ -0.6511 │
│ 2018-09-01 00:05 │ Bar │ -0.8773 │
│ 2018-09-01 00:15 │ Foo │ 0.0045 │
│ 2018-09-01 00:15 │ Bar │ 0.0904 │
│ 2018-09-01 00:20 │ Foo │ 0.6691 │
│ 2018-09-01 00:20 │ Bar │ 0.7401 │
In this step, I am trying to create a dataframe indexed by time that should look like this:
Output
│ │ Column_Foo │ Column_Bar │
├──────────────────┼────────────┼────────────┤
│ 2018-09-01 00:05 │ -1.0063 │ -1.3196 │
│ 2018-09-01 00:15 │ 0.0045 │ 0.0904 │
│ 2018-09-01 00:20 │ 0.6691 │ 0.7401 │
Essentially, I want to sum together any values that occur on the same datetime in the CSV input.
I can use newDf.loc[index, column] = value
to add value
to a new cell in location index, column
, but I see no way of appending value
to the number in the cell if it already contains a number.
newDf.loc[index, column] += value
fails with a KeyError if the index or column doesn't exist yet (even if you put it inside a pd.isnull()
!). And if they both do exist but the cell is empty, newDf.loc[index, column]
returns NaN
which together with +=
just yields NaN
.
My attempt
My best attempt so far, which works but I really hope there's a cleaner way to do this.
def awfulWayOfDoingThings(row, out):
if row.Date in out.index and "Column_" + row.Instrument in out.columns:
if np.isnan(out.at[row.Date, "Column_" + row.Instrument]):
out.at[row.Date, "Column_" + row.Instrument] = 0.0
out.at[row.Date, "Column_" + row.Instrument] += row.Value
else:
out.at[row.Date, "Column_" + row.Instrument] = row.Value
newDf = pd.DataFrame()
inputDf.apply(awfulWayOfDoingThings, axis=1, args=[newDf])
I essentially have to check:
- Does the index exist already?
- Does the column exist already?
- Is the cell NaN?
I really think I must be getting something wrong here and that there must be a significantly simpler way to do this.
What I would like to do
def lovelyWayOfDoingThings(row, out):
out.at[row.Date, "Column_" + row.Instrument] += row.Value
newDf = pd.DataFrame()
inputDf.apply(lovelyWayOfDoingThings, axis=1, args=[newDf])