0

My Data Looks Like this:

statnr      datum       ele h01    h02    h03    h04    h05    h06    h07    h08    h09    h10    h11    h12    h13    h14    h15    h16    h17    h18    h19    h20    h21    h22    h23    h24    
----------- ----------- --- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ 
   20101    20020401    D6K    103    126    115    114    105    101    118    118    130    129    126    128    132    133    131    130    130    131    130    130    125    117    122    124 
   20101    20020402    D6K    126    118    119    120    114    111    107    119    124    126    122    130    130    130    128    128    126    119    129    134    132    127    112    118 
   ........
   20101    20150909    D6K     72     82     75     76     82     93     91     96     99    101    108    108    103    100     94     90     82     92     88     79     77     89     94     92
   20101    20020401    FLP     54     61     58     61     66     67     65     56     47     46     40     40     39     32     34     34     37     43     45     45     50     54     59     63 
   20101    20020402    FLP     64     61     67     66     68     69     67     56     50     46     42     39     33     32     33     34     39     48     55     58     61     62     65     68 
   ........
   20101    20150909    FLP     93     95     92     94     94     96     95     92     90     84     87     75     81     75     75     74     83     87     89   96     94     92     91     94 
   20101    20070906    GSE  32700      0      0      0      0      0      3     10     17     30     28     27     37     44     37     25     16      5      1      0      0      0      0      0 
   20101    20070907    GSE      0      0      0      0      0      0     11     48     72    107    257    264    290    216    255    178    122     57      6      0      0      0      0      0 
   ........
   20101    20150909    GSE      0      0      0      0      0      1     17     51     71    118     82    200    116    130    142    156     48     15      1      0      0      0      0      0 
   20101    20020101    SUV      0      0      0      0      0      0      0      0      9     10     10     10     10     10     10     10      2      0      0      0      0      0      0      0 
   ........
   20101    20150909    SUV      0      0      0      0      0      0      0      0      0      1      0      5      1      4      4      9      2      0      0      0      0      0      0      0
   20101    20020401    TEX     30     18     21     18      9     10     18     42     69     91    114    117    126    135    133    127    114     87     58     47     39     33     27     24 
   ........
   20101    20150909    TEX     50     46     48     50     50     49     57     67     77     85     80    111     95    100    101     92     74     67     59     53     49     49     49     47 
   20101    20020401    QVX      6     10      9      8     13     25     19     15     16     19     24     24     19     23     24     22     24     23     19     13     12     16     16     18 
   ........
   20101    20150909    QVX     40     42     37     34     30     34     22     22     27     31     26     28     37     38     42     43     52     54     59     81     80     69     78     60 

as you can see it is a huge sheet with a statnr Row, DateRow, ele stands for the parameter and than h01 - h24 are as you can imagine the hours. I need to adjust the format from that Sheet to the Format of the other Files I'm working with (Plotting and processing reasons)

I'm currently trying to bring this FileSheet into this Format:

Date        Time    D6K FLP GSE SUV TEX QVX 
01.04.2002  01:00   103 54      0    30 6
.....
09.09.2015  23:59   92  94   0  0    47 60

So what I'm trying to do is:

1) Get rid of row[0] (statnr)

2) Switch the Header with Row[2] so that all parameters are in the header and link them to the new Time Date fmt in the lines

3) Convert the time fmt from %H%M%D to %D%M&Y %H:%M

Since I'm new to python and coding I thought I'd ask if there's maybe a package out there that deals with that kind of Problem, and if there's a term for that Problem in general (switching header with lines) --> thanks (Peter Wood) I switched the Title to Transpose

Thanks for suggestions

For Clarification: the ........ indicates that I left some rows out the ----------- are in the file

Peter S
  • 625
  • 1
  • 9
  • 32
  • 2
    The term is [**`transpose`**](https://en.wikipedia.org/wiki/Transpose). – Peter Wood Sep 24 '15 at 07:49
  • possible duplicate of [Row-to-Column Transposition in Python](http://stackoverflow.com/questions/10507104/row-to-column-transposition-in-python) – Peter Wood Sep 24 '15 at 07:51
  • 1
    Not a duplicate - the conversion is a lot more involved than simple transposition. – nekomatic Sep 24 '15 at 07:59
  • Do the `........` rows indicate that you've snipped out some rows with the same `ele` value in order to show us the structure, or are those actually in your data? Does your data set always contain the same set of dates for every value of `ele` or can some of them be missing? And finally is this just a one-off or do you need to write code to do this on several files? – nekomatic Sep 24 '15 at 08:03
  • no that indicates that I'v stripped out some rows. Some values could be missing. For example for the GSE parameter, that timeseries starts `20070907` while for the others it starts at `20020401`, in between there are definetely some missing days for the values – Peter S Sep 24 '15 at 08:08
  • I need the code for 3 Files, that date back to the year 1945. The all look the same so I think I need the code once and just load in another file – Peter S Sep 24 '15 at 08:10

1 Answers1

1

Because you may have missing data, this isn't a simple case of transposing blocks. I think what you need to do is read the input file into a data structure from which you can then look up the values as required to generate your output. In Python you can use a dictionary whose key is a tuple of your element type, date, and hour:

mydict = {}

with open('F:\myfile.txt') as f:
    z = f.readline() # discard headings
    z = f.readline() # discard row of dashes
    for line in f:
        fields = line.split()
        date = fields[1]
        ele = fields[2]
        for hour, value in enumerate(fields[3:27]):
            mydict[(ele, date, hour)] = value

Now you have all the data in a big dictionary that's addressable by ele, date and hour. I'm going to guess that the ele values are fixed and you can hardcode them, but you'll want to build a list of the unique dates you actually found in the input file, and put them in ascending order:

dateset=set()
for k in mydict.keys():
    dateset.add(k[1])
dates=list(dateset)
dates.sort()

Now you're ready to build your output file.

for date in dates:
    for hour in range(24):
        output = date + '\t' + hour
        for ele in ['D6K', 'FLP', 'GSE', 'SUV', 'TEX', 'QVX']:
            output = output + '\t' + mydict.get((ele, date, hour), '')
    print(output)

Using the get method on the dictionary allows you to specify a default value to be returned if the key you supplied isn't in the dictionary.

I haven't dealt with the date formatting (note that 'hour' ranges from 0 to 23), or writing the output to a file, but the above should get you going.

nekomatic
  • 5,988
  • 1
  • 20
  • 27
  • i edited `output = date + '\t' + hour` to `output = str(date) + '\t' + str(hour)` It gets me quite far but somehow the hour hasn#t made it to the output:`20130209 23 309 60 0 0 -106 26` and `20130210 23 280 82 0 0 -119 35` – Peter S Sep 24 '15 at 14:31
  • there`s of course more in the output... just a little sample – Peter S Sep 24 '15 at 14:32
  • 1
    `hour` in my code will just be an integer between 0 and 23, which comes from the `enumerate` function. Isn't the second item in your output the hour value? 0 corresponds to your `h01` column, etc. You can figure out how to convert it to a time in your chosen format, I'm sure. – nekomatic Sep 24 '15 at 15:18
  • hi no the second item is always 23, I'll have a look – Peter S Sep 24 '15 at 15:35
  • 1
    I corrected some indentation in my answer, check whether that was your problem. – nekomatic Sep 24 '15 at 19:00
  • It works: I get outputs like: `20110822 17 100 63 71 10 236 21` for some reason the first line in the output is `2 16 86 53 138 10 246 15` but all other lines , I'm reading into date and timefmts now to get `20110822` to `22.08.2011` and the time from `17`to `17:00` – Peter S Sep 25 '15 at 07:49
  • 1
    Not sure what's giving you that first line but sounds as if a rogue blank value is getting into the `dates` list somehow - run the code interactively or `print` values of some variables as you go along to see what's happening. I would definitely test this code with a small dataset and check that all the data is going in the right place! You could achieve the date and time formatting with string operations e.g. if `z` is `20110822` then `z[6:8]+'.'+z[4:6]+'.'+z[0:4]` gives you `22.08.2011`, etc. – nekomatic Sep 25 '15 at 08:10
  • okay kewl I got the Date right, now I'm looking for the last part, the TimeStamp – Peter S Sep 28 '15 at 18:15
  • I got everything working and from the terminal output everything looks good but i struggle with printing everything into a file, what exactly is the Output now, List, Dictionary? – Peter S Sep 28 '15 at 20:26
  • Each `print` statement outputs one line of the output with each field separated by a tab character `\t`. If you want it in a tab-separated file then put the final `for` loop in a `with open ...` block to open your output file for writing and replace the `print` with a `write` - you'll need to write the new line character(s) at the end of each line. Have a look at the Python manuals/tutorials if you need any more help with file I/O. – nekomatic Sep 29 '15 at 08:56