0

Need help converting a txt file to csv with the rows and columns intact. The text file is here: (http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=2020&MONTH=06&FROM=2300&TO=2300&STNM=72265)

So far I only have this...

df = pd.read_csv('sounding-72265-2020010100.txt',delimiter=',')
df.to_csv('sounding-72265-2020010100.csv')

But it has only one column with all the other columns within its rows. Instead want with to format it to something like this

CSV Format

Thanks for any help

1 Answers1

0

I'm assuming you can start with text copied from the website; i.e. you create a data.txt file looking like the following by copy/pasting:

1000.0      8                                                               
  925.0    718                                                               
  909.0    872   39.6    4.6     12   5.88     80      7  321.4  340.8  322.5
  900.0    964   37.6   11.6     21   9.62     75      8  320.2  351.3  322.1
  883.0   1139   36.6    7.6     17   7.47     65      9  321.0  345.3  322.4
...
...
...

Then the following works, mainly based on this answer:

import pandas as pd

df = pd.read_table('data.txt', header=None, sep='\n')
df = df[0].str.strip().str.split('\s+', expand=True)

You read the data only separating by new lines, generating a one column df. Then use string methods to format the entries and expand them into a new DataFrame.

You can then add the column names in as such with help from this answer:

col1 = 'PRES   HGHT   TEMP   DWPT   RELH   MIXR   DRCT   SKNT   THTA   THTE   THTV'.split()
col2 = 'hPa     m      C      C      %    g/kg    deg   knot     K      K      K '.split()

df.columns = pd.MultiIndex.from_tuples(zip(col1,col2), names = ['Variable','Unit'])

The result (df.head()):

Variable    PRES  HGHT  TEMP  DWPT  RELH  MIXR  DRCT  SKNT   THTA   THTE   THTV
Unit         hPa     m     C     C     %  g/kg   deg  knot      K      K      K
0         1000.0     8  None  None  None  None  None  None   None   None   None
1          925.0   718  None  None  None  None  None  None   None   None   None
2          909.0   872  39.6   4.6    12  5.88    80     7  321.4  340.8  322.5
3          900.0   964  37.6  11.6    21  9.62    75     8  320.2  351.3  322.1
4          883.0  1139  36.6   7.6    17  7.47    65     9  321.0  345.3  322.4

I would actually probably drop the "Units" column name were it me, b/c I think the multiindex columns can make things more complicated to slice.

Again, both reading the data and column names assume you can just copy paste those into a text file/into Python and then parse. If you are reading many pages like this, or were looking to do some sort of web scraping, that will require additional work.

Tom
  • 8,310
  • 2
  • 16
  • 36