1

I have a data set represented in a Pandas object, see below:

    datetime    season  holiday workingday  weather temp    atemp   humidity    windspeed   casual  registered  count
1/1/2011 0:00        1      0         0         1   9.84    14.395        81       0           3     13          16
1/1/2011 1:00        1      0          0        2   9.02    13.635        80       0           8    32           40
1/1/2011 2:00         1     0          0        3   9.02    13.635        80       0           5    27           32

p_type_1 = pd.read_csv("Bike Share Demand.csv")

p_type_1 = (p_type_1 >> 
            rename(date = X.datetime))

p_type_1.date.str.split(expand=True,)
p_type_1[['Date','Hour']] = p_type_1.date.str.split(" ",expand=True,)

p_type_1['date'] = pd.to_datetime(p_type_1['date'])

p_hour = p_type_1["Hour"]
p_hour

Now I am trying to take the sum of my column Hour that I created (p_hour)

p_hours = p_type_1["Hour"].sum()
p_hours

and get this error: TypeError: must be str, not int

so I then tried:

p_hours = p_type_1(str["Hour"].sum())
p_hours

and get this error: TypeError: 'type' object is not subscriptable

i just want the sum, what gives.

2 Answers2

0

There's quite a bit going on in here that's not correct. So I'll try to break down the issues and offer alternatives.

Here:

p_hours = p_type_1(str["Hour"].sum())
p_hours

What your issue is, is that you are actually trying to do this:

p_hours = p_type_1([str("Hour")].sum())
p_hours

Instead of doing that, your code technically asks for the property named 'Hour' in the string type. That's not what you are trying to do. This crash is unrelated to your core problem, and is just a syntax error.

What the problem actually is here, is that your dataframe column has mixed string and integer types together in the same column. The sum operation will concatenate string, or sum numeric types. In a mixed type, it will fail out.

In order to verify that this is the issue however, we would need to see your actual dataframe, as I have a feeling the one you gave may not be the correct one.

As a proof of concept, I created the following example:

import pandas as pd
dta = [str(x) for x in range(20)]
dta.append(12)
frame = pd.DataFrame.from_dict({
    "data": dta})

print(frame["data"].sum())

>>> TypeError: can only concatenate str (not "int") to str

Note that the newer editions of pandas have more clear error messages.

0

Your dataframe datatypes are problem. Take a closer look at this question: Convert DataFrame column type from string to datetime, dd/mm/yyyy format

Sample code that should be solution for your problem, i simplified CSV

'''
CSV

datetime,season
1/1/2011 0:00,1
1/1/2011 1:00,1
1/1/2011 2:00,1

'''

import pandas as pd

p_type_1 = pd.read_csv("Bike Share Demand.csv")
p_type_1['datetime'] = p_type_1['datetime'].astype('datetime64[ns]')
p_type_1['hour'] = [val.hour for i, val in p_type_1['datetime'].iteritems()]
print(p_type_1['hour'].sum())