0

I have such a data frame

0 1 2 0 240 RSOW 2008-07-11 20:35:00 1 250 RSOW 2008-06-27 19:10:00 ...

I want to sort it by column 2 by date. But later on same dates look like \N and I want to omit those. And the second issue is the format – time and date are in the same column How can I sort this with pandas without any problems with \N and this time thing?

ggegoge
  • 71
  • 1
  • 4
  • 1
    Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Jan 26 '18 at 09:06

1 Answers1

0

I think I've solved this myself.

rows = []   
for index, row in subset.iterrows():
    try:
        yr = int(row[2][:4]) # assure it has a numer like year 
        if yr > 2000:
            rows.append(row)
    except ValueError:
        continue

if it is some kind of NaN like thing for example the \N I mentioned then a ValueError will occur and it won't be considered in future analysis. Next I've simply took the list with rows (rows) and with some list comprehension I created a new data frame

dic = {"date": [row[2] for row in rows]}
df = pd.DataFrame(dic)
df = df.sort_values(by="date")

Quite a rookie thing must admit

ggegoge
  • 71
  • 1
  • 4