You can try:
out = df[df.columns[0]].str.split(r"\s+")\
.apply(lambda x: pd.Series([" ".join(x[:2])] + ["{}{}".format(a, b) for a, b in zip(x[2::2], x[3::2])])) \
.rename(columns={old:new for old, new in enumerate(cols)})
Explanations:
- After selecting the only column using
df[df.columns[0]]
, split this column according all spaces using pandas.Series.str.split
. The regex is simple \s+
: df[df.columns[0]].str.split(r"\s+")
- Apply a custom function on each rows using
apply
.
- First recreate the date by merging the 2 first elements with a space, convert to a date and wrap it in a list:
[pd.Timestamp(" ".join(x[:2]))]
- Merge all the others values 2-by2 using
zip
. This discussion provides more details.
[float("{}{}".format(a, b)) for a, b in zip(x[2::2], x[3::2])]
Convert this list in pd.Series
wrapping result from step 2.1 and 2.2.
Rename the columns using rename
. A dict-comprehension let us perform the expected result: .rename(columns={old:new for old, new in enumerate(cols)})
Full code + illustration:
print(df)
# Date/Time Var1 Var2 Var3 Var4 Var5 Var6
# 0 17/04/2020 00:00:00 133 579.20 31 978.90 377 9...
# 1 17/04/2020 01:00:00 133 583.70 31 980.40 377 9...
# Step 1
print(df[df.columns[0]].str.split(r"\s+"))
# 0 [17/04/2020, 00:00:00, 133, 579.20, 31, 978.90...
# 1 [17/04/2020, 01:00:00, 133, 583.70, 31, 980.40...
# Name: Date/Time Var1 Var2 Var3 Var4 Var5 Var6, dtype: object
# Step 2.1
print(df[df.columns[0]].str.split(r"\s+")
.apply(lambda x: [pd.Timestamp(" ".join(x[:2]))]))
# 0 [2020-04-17 00:00:00]
# 1 [2020-04-17 01:00:00]
# Name: Date/Time Var1 Var2 Var3 Var4 Var5 Var6, dtype: object
# Step 2.2
print(df[df.columns[0]].str.split(r"\s+")
.apply(lambda x: [float("{}{}".format(a, b)) for a, b in zip(x[2::2], x[3::2])]))
# 0 [133579.2, 31978.9, 377952.81, 179412.41, 2035...
# 1 [133583.7, 31980.4, 377963.41, 179412.41, 2035...
# Name: Date/Time Var1 Var2 Var3 Var4 Var5 Var6, dtype: object
# Step 2.3
print(df[df.columns[0]].str.split(r"\s+")
.apply(lambda x: pd.Series([pd.Timestamp(" ".join(x[:2]))] + [float("{}{}".format(a, b)) for a, b in zip(x[2::2], x[3::2])])))
# 0 1 2 3 4 5 6
# 0 2020-04-17 00:00:00 133579.2 31978.9 377952.81 179412.41 203595.2 70447.4
# 1 2020-04-17 01:00:00 133583.7 31980.4 377963.41 179412.41 203595.2 70448.2
# Step 3
print(df.columns[0].split(" "))
# ['Date/Time', 'Var1', 'Var2', 'Var3', 'Var4', 'Var5', 'Var6']
out = df[df.columns[0]].str.split(r"\s+")\
.apply(lambda x: pd.Series([pd.Timestamp(" ".join(x[:2]))] + [float("{}{}".format(a, b)) for a, b in zip(x[2::2], x[3::2])])) \
.rename(columns={old: new for old, new in enumerate(df.columns[0].split(" "))})
print(out)
# Date/Time Var1 Var2 Var3 Var4 Var5 Var6
# 0 2020-04-17 00:00:00 133579.2 31978.9 377952.81 179412.41 203595.2 70447.4
# 1 2020-04-17 01:00:00 133583.7 31980.4 377963.41 179412.41 203595.2 70448.2
print(out.dtypes)
# Date/Time datetime64[ns]
# Var1 float64
# Var2 float64
# Var3 float64
# Var4 float64
# Var5 float64
# Var6 float64
# dtype: object