-1

I have a list that looks like this:

Sum = ['* Report_type         Leach\n',
       '* Result_text         Concentration \n',
       '* Run_Id              179\n',
       '* Location            MUENSTER\n',
       '* Meteo_station       KREM-M\n',
       '* Soil_type           KREM\n',
       '* Crop_calendar       SUGARBEET\n',
       '* Substance           ABC\n',
       '* Application_scheme  DRY\n',
       '* Deposition_scheme   No\n',
       '* Results             0.0001\n'
       ]

and I want to convert it to a pandas dataframe like this:

df =    
        col1                col2
0       Report_type         Leach          
1       Result_text         Concentration
2       Run_Id              179                
3       Location            MUENSTER      
4       Meteo_station       KREM-M             
5       Soil_type           KREM       
6       Crop_calendar       SUGARBEET     
7       Substance           ABC                
8       Application_scheme  DRY                
9       Deposition_scheme   No                 
10      Results             0.0001

The first columns of characters in the list has a fixed lenght.

Bob
  • 443
  • 7
  • 15
  • Here's a StackOverflow post on parsing fixed-width fields with Python. The original post specifically mentioned fixed-length: https://stackoverflow.com/questions/4914008/how-to-efficiently-parse-fixed-width-files – jsmart Jul 30 '20 at 19:07

2 Answers2

3

IIUC:

df = pd.DataFrame([i.split(maxsplit=2)[1:] for i in Sum],columns=['col1','col2'])

Output:

                  col1           col2
0          Report_type          Leach
1          Result_text  Concentration
2               Run_Id            179
3             Location       MUENSTER
4        Meteo_station         KREM-M
5            Soil_type           KREM
6        Crop_calendar      SUGARBEET
7            Substance            ABC
8   Application_scheme            DRY
9    Deposition_scheme             No
10             Results         0.0001
Juan C
  • 5,846
  • 2
  • 17
  • 51
  • Great solution but what if there are more than one word in column two? Like instead of "Concentration" it was " Concentration of Water". Then it will split at each blank spaces and creates two more columns. – Bob Jul 30 '20 at 19:12
  • 1
    Hey @Bob, edited my answer so it splits a maximum of two times, thus keeping the whole phrase in column two if it has more than one word – Juan C Jul 30 '20 at 19:14
1

Using str methods

Ex:

data = ['* Report_type         Leach\n',
       '* Result_text         Concentration \n',
       '* Run_Id              179\n',
       '* Location            MUENSTER\n',
       '* Meteo_station       KREM-M\n',
       '* Soil_type           KREM\n',
       '* Crop_calendar       SUGARBEET\n',
       '* Substance           ABC\n',
       '* Application_scheme  DRY\n',
       '* Deposition_scheme   No\n',
       '* Results             0.0001\n'
       ]

df = pd.DataFrame({"Col": data})
df[['col1', 'col2']] = df.pop('Col').str.strip(" * ").str.split(expand=True)
print(df)

Output:

                  col1           col2
0          Report_type          Leach
1          Result_text  Concentration
2               Run_Id            179
3             Location       MUENSTER
4        Meteo_station         KREM-M
5            Soil_type           KREM
6        Crop_calendar      SUGARBEET
7            Substance            ABC
8   Application_scheme            DRY
9    Deposition_scheme             No
10             Results         0.0001
Rakesh
  • 81,458
  • 17
  • 76
  • 113