How can I convert a pandas dataframe from a raw text in Python?

Question

I have a text file containing data like this, formatted in a list, where the first element is a string containing the column names sepparated by ';', and the next elements are the value rows:

['Timestamp;T;Pressure [bar];Input line pressure [bar];Speed [rpm];Angular Position [degree];Wheel speed [rpm];Wheel angular position [degree];',
';1;5,281;5,303;219,727;10,283;216,363;45;',
';1;5,273;5,277;219,727;11,602;216,363;45;',
';1;5,288;5,293;205,078;12,832;216,363;45;',
';1;5,316;5,297;219,727;14,15;216,363;45;',
';1;5,314;5,307;219,727;15,469;216,363;45;',
';1;5,288;5,3;219,727;16,787;216,363;45;',
';1;5,318000000000001;5,31;219,727;18,105;216,363;45;',
';1;5,304;5,3;219,727;19,424;216,388;56,25;',
';1;5,291;5,29;219,947;20,742;216,388;56,25;',
';1;5,316;5,297;219,507;22,061;216,388;56,25;']

How can I convert this list of text into a pandas dataframe?

koPytok · Answer 1 · 2018-06-18T07:52:06.107

8

Use pd.read_csv, that reads dataframe from text files, and pd.compat.StringIO, that makes stream from text, like io.StingIO:

pd.read_csv(pd.compat.StringIO("\n".join(lines)), sep=";")

edited Jun 18 '18 at 07:52

answered Jun 18 '18 at 07:27

koPytok

3,453
1
14
29

3

I prefer this solution to the one above, because it delegates the dirty work to pandas. Use this when there isn't too much data (i/o is always a bottleneck). – cs95 Jun 18 '18 at 07:38
2

only works for pandas < 0.25 https://stackoverflow.com/questions/57104639/how-to-fix-importerror-cannot-import-name-stringio – mzakaria Jan 17 '21 at 23:48

Nihal · Accepted Answer · 2018-06-18T07:39:29.123

code:

df = [
    'Timestamp;T;Pressure [bar];Input line pressure [bar];Speed [rpm];Angular Position [degree];Wheel speed [rpm];Wheel angular position [degree];',
    ';1;5,281;5,303;219,727;10,283;216,363;45;',
    ';1;5,273;5,277;219,727;11,602;216,363;45;',
    ';1;5,288;5,293;205,078;12,832;216,363;45;',
    ';1;5,316;5,297;219,727;14,15;216,363;45;',
    ';1;5,314;5,307;219,727;15,469;216,363;45;',
    ';1;5,288;5,3;219,727;16,787;216,363;45;',
    ';1;5,318000000000001;5,31;219,727;18,105;216,363;45;',
    ';1;5,304;5,3;219,727;19,424;216,388;56,25;',
    ';1;5,291;5,29;219,947;20,742;216,388;56,25;',
    ';1;5,316;5,297;219,507;22,061;216,388;56,25;']

mat = [n.split(';') for n in df]
print(mat)
newdf1 = pd.DataFrame(mat)
newdf1.columns = newdf1.iloc[0]
newdf1 = newdf1.reindex(newdf1.index.drop(0))
# newdf2 = pd.DataFrame.from_dict(df)
print(newdf1)

output:

0  Timestamp  T     Pressure [bar] Input line pressure [bar] Speed [rpm]  \
1             1              5,281                     5,303     219,727   
2             1              5,273                     5,277     219,727   
3             1              5,288                     5,293     205,078   
4             1              5,316                     5,297     219,727   
5             1              5,314                     5,307     219,727   
6             1              5,288                       5,3     219,727   
7             1  5,318000000000001                      5,31     219,727   
8             1              5,304                       5,3     219,727   
9             1              5,291                      5,29     219,947   
10            1              5,316                     5,297     219,507   

0  Angular Position [degree] Wheel speed [rpm]  \
1                     10,283           216,363   
2                     11,602           216,363   
3                     12,832           216,363   
4                      14,15           216,363   
5                     15,469           216,363   
6                     16,787           216,363   
7                     18,105           216,363   
8                     19,424           216,388   
9                     20,742           216,388   
10                    22,061           216,388   

0  Wheel angular position [degree]    
1                               45    
2                               45    
3                               45    
4                               45    
5                               45    
6                               45    
7                               45    
8                            56,25    
9                            56,25    
10                           56,25

Perfect! And how can I put the first row as header? – jartymcfly Jun 18 '18 at 07:33 — jartymcfly, Jun 18 '18 at 07:33

user2314737 · Answer 3 · 2018-11-30T08:48:50.973

You could use the function from_records() splitting each string item in the input list and taking care of the fact that the first line of your data contains the columns' labels

>>> data = ['Timestamp;T;Pressure [bar];Input line pressure [bar];Speed \
[rpm];Angular Position [degree];Wheel speed [rpm];Wheel angular position [degree];', \
';1;5,281;5,303;219,727;10,283;216,363;45;', \
';1;5,273;5,277;219,727;11,602;216,363;45;', \
';1;5,288;5,293;205,078;12,832;216,363;45;', \
';1;5,316;5,297;219,727;14,15;216,363;45;', \
';1;5,314;5,307;219,727;15,469;216,363;45;', \
';1;5,288;5,3;219,727;16,787;216,363;45;', \
';1;5,318000000000001;5,31;219,727;18,105;216,363;45;', \
';1;5,304;5,3;219,727;19,424;216,388;56,25;', \
';1;5,291;5,29;219,947;20,742;216,388;56,25;', \
';1;5,316;5,297;219,507;22,061;216,388;56,25;']

>>> df = pd.DataFrame.from_records([r.split(';') for r in data[1:]], columns=data[0].split(';'))

>>> df
  Timestamp  T     Pressure [bar] Input line pressure [bar] Speed [rpm]  \
0            1              5,281                     5,303     219,727
1            1              5,273                     5,277     219,727
2            1              5,288                     5,293     205,078
3            1              5,316                     5,297     219,727
4            1              5,314                     5,307     219,727
5            1              5,288                       5,3     219,727
6            1  5,318000000000001                      5,31     219,727
7            1              5,304                       5,3     219,727
8            1              5,291                      5,29     219,947
9            1              5,316                     5,297     219,507

 ...

score 0 · Answer 4 · answered Jun 26 '21 at 15:02

0

Shorter base on @Nihal solution

df = [n.split(';') for n in raw_data_text]
df = pd.DataFrame(df[1:], columns=df[0])

answered Jun 26 '21 at 15:02

matt91t

103
1
8

prerak sethia · Answer 5 · 2022-08-15T07:40:27.503

0

If there are just comma separated values as output to your model - you can use this to convert into a pandas dataframe (content is your output in streamlit app)

out = [line.split(",") for line in content.strip().split("\n")]
df1 = pd.DataFrame(out)
df1.columns = df1.iloc[0]
df1 = df1.reindex(df1.index.drop(0))
st.write(df1)

edited Aug 15 '22 at 07:40

answered Aug 15 '22 at 07:36

prerak sethia

1
1

score 0 · Answer 6 · edited Feb 26 '23 at 14:50

0

First you can create variable read_file and use pandas.read_csv() function to open it. Then you transform it to csv file with read_file.to_csv() function. After that you will open dataframe with pd.read_csv().

read_file = pd.read_csv('variable.txt', sep = ';')
df = read_file.to_csv ('variable.csv', index=None)
df = pd.read_csv('variable.csv')

I believe answers to same/similar problems can be found here: Load data from txt with pandas

edited Feb 26 '23 at 14:50

S.B

13,077
10
22
49

answered Feb 21 '23 at 22:36

Toshiro

1
1

Remember that Stack Overflow isn't just intended to solve the immediate problem, but also to help future readers find solutions to similar problems, which requires understanding the underlying code. This is especially important for members of our community who are beginners, and not familiar with the syntax. Given that, **can you [edit] your answer to include an explanation of what you're doing** and why you believe it is the best approach? – Jeremy Caney Feb 22 '23 at 03:04

How can I convert a pandas dataframe from a raw text in Python?

6 Answers6