I have a pandas dataframe that looks like this:
id | year | month | day | hour | timestamp | location_statindex |
---|---|---|---|---|---|---|
TP6045276120589011 | 2022 | 09 | 01 | 02 | 2022-09-01 02:01:23 | 1 |
TP6031280073133032 | 2022 | 12 | 16 | 01 | 2022-12-16 01:48:42 | 1 |
with around 300.000 rows.
With dytpes
id object
year object
month object
day object
hour object
timestamp datetime64[ns]
location_statindex object
dtype: object
Now I create a new column which contains elements from each row, like:
1\2022\09\01\02
with this code:
df['folder_path'] = df[['location_statindex', 'year', 'month', 'day', 'hour']].agg('\\'.join, axis=1)
The question is now if it is possible to change the aggregation, so that the value from location_statindex is a different string depending on an if condition.
Like this:
location = '2'
if location in ['1','2','3','4']:
location = f'Platz_optimiert_{location}'
elif location in ['5','6']:
location = f'KSPlatz_{location}'
else:
location = f'Platz_optimiert_TEF_{location}'
Expected output is a new column in the same dataframe with a joined string from the element of the corresponing row. For example the following string for the first row.
'Platz_optimiert_1\2022\09\01\02'
I already have a solution that uses the apply function from pandas with a self written function which contains the if block. But thinking about the size of the dataframe I want to try to reduce the necessary time. Is aggregation even faster than apply?