loading semi structured data to pandas

Question

I have data that looks like this (from jq)

script_runtime{application="app1",runtime="1651394161"} 1651394161
folder_put_time{application="app1",runtime="1651394161"} 22
folder_get_time{application="app1",runtime="1651394161"} 128.544
folder_ls_time{application="app1",runtime="1651394161"} 3.868
folder_ls_count{application="app1",runtime="1651394161"} 5046

The dataframe should allow manipulation of each row to this:

script_runtime,app1,1651394161,1651394161
folder_put_time,app1,1651394161,22

Its in a textfile. How can I easily load it into pandas for data manipulation?

K. Thorspear · Accepted Answer · 2022-05-03T16:47:54.493

1

Load the .txt using pd.read_csv(), specifying a space as the separator (similar StackOverflow answer). The result will be a two-column dataframe with the bracketed text in the first column, and the float in the second column.

df = pd.read_csv("textfile.txt", header=None, delimiter=r"\s+")

Parse the bracketed text into separate columns:

df['function'] = df[0].str.split("{",expand=True)[0]
df['application'] = df[0].str.split("\"",expand=True)[1]
df['runtime'] = df[0].str.split("\"",expand=True)[3]

The result is a dataframe looks like this:

If you want to drop the first column which contains the bracketed value:
df = df.iloc[: , 1:]

Full code:

df = pd.read_csv("textfile.txt", header=None, delimiter=r"\s+")

df['function'] = df[0].str.split("{",expand=True)[0]
df['application'] = df[0].str.split("\"",expand=True)[1]
df['runtime'] = df[0].str.split("\"",expand=True)[3]

df = df.iloc[: , 1:]

edited May 03 '22 at 16:47

answered May 03 '22 at 05:45

K. Thorspear

473
3
12

Thanks! The output still shows as following : `>>> print(df[1:2]) 0 1 1 folder_put_time{application="app1",runtime="16... 22.0` – Stat.Enthus May 03 '22 at 06:11
Apologies, my parsing code was missing some elements. I've updated it above. Running `print(df[1:2])` now outputs the properly formatted columns. – K. Thorspear May 03 '22 at 06:25
Edited my post a bit. Rerunning this did separate it into two columns but I need to extract all – Stat.Enthus May 03 '22 at 09:28
Thank you for doing that! I've edited my answer as well -- the code above should output those columns. If you're looking to remove the original first column with the three values that have been parsed, I've added that code too – K. Thorspear May 03 '22 at 14:51
I wondered if your question about extracting all the rows was because you are running `print(df[1:2])`. This will only print one row (the second one). If you want to print all the rows, you would run `print(df)`. – K. Thorspear May 03 '22 at 16:44

loading semi structured data to pandas

1 Answers1