0

I have a data which has the following structure in only one column:

Datetime stamp 1
Obs1
Obs2
Obs3
Datetime stamp 2
Obs1
Obs2
Obs3

I want to convert it like above. Such that date time will be header and all obs of that specific datetime will become rows of that specific date time stamp

Date time stamp 1.     Date time stamp2
Obs1                         Obs1
Obs2.                         obs2
Obs3.                         Obs3
S.B
  • 13,077
  • 10
  • 22
  • 49

2 Answers2

1

Assuming your single column is stored in a list/array, you can make the sub-lists you want like this :

lst = ['Datetime stamp 1', 'Obs1', 'Obs2', 'Obs3', 'Datetime stamp 2', 'Obs1', 'Obs2', 'Obs3']

result = []
temp = [lst[0]]
for item in lst[1:]:
    if item.startswith('Datetime'):
        result.append(temp)
        temp = [item]
    else:
        temp.append(item)

result.append(temp)
print(result)

output:

[['Datetime stamp 1', 'Obs1', 'Obs2', 'Obs3'], ['Datetime stamp 2', 'Obs1', 'Obs2', 'Obs3']]

It is now a list of lists which every single elements inside can represent a column for you.

S.B
  • 13,077
  • 10
  • 22
  • 49
0

Assuming the format is the same all along (i.e all splits starts with the string "Datetime"), you can get the index of where the string starts with "Datetime", and chose all data between each splits:

import pandas as pd

data = pd.Series(["Datetime stamp 1",
"Obs1",
"Obs2",
"Obs3",
"Datetime stamp 2",
"Obs1",
"Obs2",
"Obs3"])


#Get splits
idx_split =data.str.startswith("Datetime ")
idx_split = idx_split.index[idx_split] # [0,4]


N_COLS = len(idx_split) #number of columns
vals = [0]*N_COLS #Initialize values

#Loop over each split-index and slize data
for i in range(N_COLS-1):
    vals[i] = list(data[idx_split[i]:idx_split[i+1]])
vals[i+1] = list(data[idx_split[-1]:]) #Get the last one

print(vals)
#[['Datetime stamp 1', 'Obs1', 'Obs2', 'Obs3'],
#['Datetime stamp 2', 'Obs1', 'Obs2', 'Obs3']]


#Get the first element from each list and use that as column name
# + remove it
cols = [p.pop(0) for p in vals] 


#The data list is in wrong shape for pandas, use https://stackoverflow.com/questions/6473679/transpose-list-of-lists to transpose the list to right shape

df = pd.DataFrame(list(map(list, zip(*vals))),columns = cols)
print(df)

#Datetime stamp 1   Datetime stamp 2
#0  Obs1    Obs1
#1  Obs2    Obs2
#2  Obs3    Obs3

CutePoison
  • 4,679
  • 5
  • 28
  • 63
  • Hello friend , I still have a doubt my data is too large and datetime stamp is an object data type . And the raw data is an text file which I am reading using read.csv . – Sujit Thakur Oct 11 '21 at 23:08
  • what do you mean with "data is too large"? To fit in memory? And do you need to convert the data to "datetime" or..? – CutePoison Oct 12 '21 at 05:37