0

Currently on my project I am trying to sort the rows of a CVS sheet by a singular column, I am using PANDAS and I have seen several examples posted all around the internet, however when trying to implement this myself I have been unable to.

db = pd.read_csv(databasefile, skip_blank_lines=True, names=['ExampleOne','ExampleTwo','ExampleThree','ExampleFour'], header=1)
db.drop_duplicates(inplace=True)

db.sort_values(by=['ExampleOne'], ascending=[True])

db.to_csv(databasefile, index=False)

In the code above my thought would be that I am turning a CSV into a dataframe for PANDAS to use, in that dataframe I am dropping any duplicated rows and am sorting by the ExampleOne Column. In the end I am sending that information back to the CSV. However, when viewing the CSV after the code runs with no errors the data is not sorted in any order.

Database CSV Link

Here is the CSV in a txt format, the first 60 or so rows are sorted but that is becuase earlier in this process I am combining multiple CSV's together into one CSV.

Thankyou for reading! I would appreciate any help or suggestions anyone would have me try out as this problem has been frustrating for me. Thanks again for reading!

Timberghost_
  • 155
  • 4
  • 16
  • 1
    `db.sort_values(by=['ExampleOne'], ascending=[True], inplace=True)` or you can chain the operations: `db.sort_values(by=['ExampleOne'], ascending=[True]).to_csv(databasefile, index=False)`. – Quang Hoang Sep 23 '19 at 18:37
  • @QuangHoang I have yet to try that out as I am updating some software on my computer, would chaining the .to_csv function to the sort function actually make a difference? Thankyou for the reply, ill be sure to check it out when my computer is back online – Timberghost_ Sep 23 '19 at 18:49
  • 1
    Yes, sort_values without inplace returns the sorted dataframe, not sort your original. – Quang Hoang Sep 23 '19 at 18:51
  • @QuangHoang Thankyou very much! works flawlessly now, I didnt realize that sorting did not replace the original dataframe. Once again thankyou! – Timberghost_ Sep 23 '19 at 19:21

1 Answers1

1
databasefile = r"path"
databasefile2 = r"path"
db = pd.read_csv(databasefile, skip_blank_lines=True, names=['ExampleOne','ExampleTwo','ExampleThree','ExampleFour'], header=1)
print(db['ExampleOne'])
db.drop_duplicates(inplace=True)
db.sort_values(by=['ExampleOne'], ascending=True).to_csv(databasefile, index=False)

Here is a solution to your problem.

sns
  • 375
  • 1
  • 2
  • 9
Jiraiya
  • 101
  • 1
  • 9