-1

I have data that looks like this (from jq)

script_runtime{application="app1",runtime="1651394161"} 1651394161
folder_put_time{application="app1",runtime="1651394161"} 22
folder_get_time{application="app1",runtime="1651394161"} 128.544
folder_ls_time{application="app1",runtime="1651394161"} 3.868
folder_ls_count{application="app1",runtime="1651394161"} 5046

The dataframe should allow manipulation of each row to this:

script_runtime,app1,1651394161,1651394161
folder_put_time,app1,1651394161,22

Its in a textfile. How can I easily load it into pandas for data manipulation?

Stat.Enthus
  • 335
  • 1
  • 12

1 Answers1

1
  1. Load the .txt using pd.read_csv(), specifying a space as the separator (similar StackOverflow answer). The result will be a two-column dataframe with the bracketed text in the first column, and the float in the second column.
df = pd.read_csv("textfile.txt", header=None, delimiter=r"\s+")
  1. Parse the bracketed text into separate columns:
df['function'] = df[0].str.split("{",expand=True)[0]
df['application'] = df[0].str.split("\"",expand=True)[1]
df['runtime'] = df[0].str.split("\"",expand=True)[3]

The result is a dataframe looks like this: enter image description here

If you want to drop the first column which contains the bracketed value:
df = df.iloc[: , 1:]

enter image description here

Full code:

df = pd.read_csv("textfile.txt", header=None, delimiter=r"\s+")

df['function'] = df[0].str.split("{",expand=True)[0]
df['application'] = df[0].str.split("\"",expand=True)[1]
df['runtime'] = df[0].str.split("\"",expand=True)[3]

df = df.iloc[: , 1:]
K. Thorspear
  • 473
  • 3
  • 12
  • Thanks! The output still shows as following : `>>> print(df[1:2]) 0 1 1 folder_put_time{application="app1",runtime="16... 22.0` – Stat.Enthus May 03 '22 at 06:11
  • Apologies, my parsing code was missing some elements. I've updated it above. Running `print(df[1:2])` now outputs the properly formatted columns. – K. Thorspear May 03 '22 at 06:25
  • Edited my post a bit. Rerunning this did separate it into two columns but I need to extract all – Stat.Enthus May 03 '22 at 09:28
  • Thank you for doing that! I've edited my answer as well -- the code above should output those columns. If you're looking to remove the original first column with the three values that have been parsed, I've added that code too – K. Thorspear May 03 '22 at 14:51
  • I wondered if your question about extracting all the rows was because you are running `print(df[1:2])`. This will only print one row (the second one). If you want to print all the rows, you would run `print(df)`. – K. Thorspear May 03 '22 at 16:44