0

I am working with some data that is "water falled" initially (in Excel) but becomes normal after a certain number of lines in Excel.

Essentially what I need the script to do is to remove empty data rows and only keep the full rows, but the location of the rows where data is full is variable, and I am not too experienced with Python so I am not quite sure how to get it to filter out the "waterfall" rows.

I attached a simple example of what I mean, row 10 would be where the script would need to start keeping the rows: Excel Example

What I have written, assuming the .py script is in the same folder as the data that needs to be filtered, along with a folder 'Archive' and 'Downsampled'.

folders = os.listdir('./')
for folder in folders:
        filename = folder
        f = open(filename,'r')
        lines = f.readlines()
        f.close()

        shutil.move(folder, './Archive')

        f_out = open('./DownSampled/' + folder + '.csv', 'w')
        #filter the data.   
        f_out.close()
Citut
  • 847
  • 2
  • 10
  • 25
  • 3
    SO is not a code-writing-service, be so kind and provide some code please! – linusg May 09 '16 at 13:12
  • Sure, like I said, I don't really know where to start, but I guess I have the file-open part written. – Citut May 09 '16 at 13:19
  • 1
    So share that with us! We need a point to start... – linusg May 09 '16 at 13:19
  • Well, I added everything that I have figured out. It is in the original post! Essentially, opening the file(s), reading the lines, moving the original file to archive, creating a new file (in Downsampled folder), filtering, and saving. – Citut May 09 '16 at 13:23
  • 1
    Maybe consider using Pandas dropna feature? [See here which has a lot of good solutions on how to remove rows from a dataframe](http://stackoverflow.com/questions/13413590/how-to-drop-rows-of-pandas-dataframe-whose-value-of-certain-column-is-nan) – Clusks May 09 '16 at 15:02
  • Thanks for that, I ended up using that feature. – Citut May 09 '16 at 15:58

1 Answers1

0

I ended up using pandas dropna() feature. Thank you.

Citut
  • 847
  • 2
  • 10
  • 25