-1

How to seperate dataframe as follows:

   yr   mon  day      Tmax  Tmin   pcp
  2013   4  22        5.09-10.92   0.0                         
  2013   4  23        2.77 -9.63   0.5                         
  2013   4  24        0.28 -9.90   9.9                         
  2013   4  25        0.76 -6.70  12.2                         
  2013   4  26       -0.35 -9.48   0.0                         
  2013   4  27        7.22-10.47   0.0                         
  2013   4  28        4.19-10.78   0.0  

you see: there are no whitespaces between Tmax and Tmin in principle. The max width of Tmax and Tmin are 6 char-spaces. If less than 6 spaces , filled by whitespace. I want to read it to df and seperate each columns.
seperate columns as given char length?

Cobin
  • 888
  • 13
  • 25

2 Answers2

2

try this:

df = pd.read_fwf(filename)
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
1

It seems you need str.extract floats and ints, solution works if all data are in one column which is selected by iloc:

pat="(\d+)\s*(\d+)\s*(\d+)\s*([-+]?\d+\.\d+|\d+)\s*([-+]?\d+\.\d+|\d+)\s*([-+]?\d+\.\d+|\d+)"
df1 = df.iloc[:, 0].str.extract(pat, expand=True)
df1.columns = ['year', 'mont','day','Tmax','Tmin','pcp']
print (df1)
   year mont day   Tmax    Tmin   pcp
0  2013    4  22   5.09  -10.92   0.0
1  2013    4  23   2.77   -9.63   0.5
2  2013    4  24   0.28   -9.90   9.9
3  2013    4  25   0.76   -6.70  12.2
4  2013    4  26  -0.35   -9.48   0.0
5  2013    4  27   7.22  -10.47   0.0
6  2013    4  28   4.19  -10.78   0.0

Another solution is use read_fwf and specify colspecs:

import pandas as pd
from pandas.compat import StringIO

temp=u"""yr  mon  day       Tmax  Tmin   pcp
  2013   4  22        5.09-10.92   0.0                         
  2013   4  23        2.77 -9.63   0.5                         
  2013   4  24        0.28 -9.90   9.9                         
  2013   4  25        0.76 -6.70  12.2                         
  2013   4  26       -0.35 -9.48   0.0                         
  2013   4  27        7.22-10.47   0.0                         
  2013   4  28        4.19-10.78   0.0  """
#after testing replace 'StringIO(temp)' to 'filename.csv'

names=['year', 'mont','day','Tmax','Tmin','pcp']
colspecs = [(0, 6), (9, 10), (12, 14), (21, 26),(26,32),(34,38)]
df = pd.read_fwf(StringIO(temp),colspecs=colspecs, names=names, header=0)
print (df)
   year  mont  day  Tmax   Tmin   pcp
0  2013     4   22  5.09 -10.92   0.0
1  2013     4   23  2.77  -9.63   0.5
2  2013     4   24  0.28  -9.90   9.9
3  2013     4   25  0.76  -6.70  12.2
4  2013     4   26 -0.35  -9.48   0.0
5  2013     4   27  7.22 -10.47   0.0
6  2013     4   28  4.19 -10.78   0.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252