This question can be solved very easily by using pandas and numpy:
import pandas as pd
import numpy as np
data = "MRN: 5399394 \n Adfdf Kim \n Telemedicine: \n 3/29/2021 \n INT Pediatric Specialties - \n G \n Encounter providers: \n DB Polar, MD (Genetics) and Bar K Wright, \n RD/LD (Nutrition) \n Primary diagnosis: \n HUG \n Reason for Visit: \n Referred by Provider Not In System"
x = data.split("\n")
x = [i.strip() for i in x]
d = {}
print(x)
flag = False
column = ""
multi_entry = False
add_all = False
count = 0
for i in x:
for j in i:
if j in [":", "-"]:
flag = True
if j == ":":
colon = True
else:
colon = False
if multi_entry:
add_all = True
if j == ",":
multi_entry = True
count += 1
break
if flag:
if colon:
temp = i.split(":")
else:
temp = i.split("-")
d[temp[0].strip()] = []
if temp[1] != '':
d[temp[0].strip()].append(temp[1].strip())
column = temp[0].strip()
if add_all:
current_index = x.index(i)
d[list(d.keys())[-2]].append("\n ".join([x[index]
for index in range(current_index-1-count, current_index-1)]))
multi_entry = False
add_all = False
else:
if multi_entry:
continue
d[column].append(i)
flag = False
# Determine the maximum length among all lists
max_length = max(len(v) for v in d.values())
# Pad the lists with NaN to make them the same length
for key, values in d.items():
d[key] = values + [float('nan')] * (max_length - len(values))
# Create a DataFrame from the padded dictionary
df = pd.DataFrame(d)
print(df)
Since you didn't specify that your dataframe had multiple row entries or a single row entry as this, I assumed that the question had a single row. Either way, you could extract the entire data from the dataframe and come with a a list of entries. You could then split it up based on the column delimeters that you specified.
Hope this helps!