Using .sort_values() in Python

Question

I am trying to figure out how to sort a dataframe base on the values of a column.

At the moment, it is re-organising the dataframe order, but not in order of smallest -> largest or largest -> smallest. It seems to be a random order.

What am I doing wrong?

it looks like the column values are str, not numeric. Note there are commas. — buran, Feb 21 '22 at 19:05
Please, show [mre]. Don't post images of code, error, data. Check https://stackoverflow.com/q/20109391/4046632 — buran, Feb 21 '22 at 19:06
Does this answer your question? [pandas reading CSV data formatted with comma for thousands separator](https://stackoverflow.com/questions/37439933/pandas-reading-csv-data-formatted-with-comma-for-thousands-separator) — buran, Feb 21 '22 at 19:08
Read the CSV using the `dtype` param (of `read_csv`) as `dtype={'Impressions': int}`. Then, sort will work naturally / as expected. — S3DEV, Feb 21 '22 at 19:42
@S3DEV, without `thousands=','` this will raise `ValueError: invalid literal for int() with base 10` — buran, Feb 21 '22 at 19:44
@buran - Ah yes. Then both could be used (?) to read the data in properly from source, rather than having to fix it later. — S3DEV, Feb 21 '22 at 19:57
@S3DEV, I **guess** their file uses comma as both separator and thousands separator and some values are quoted. Just supplying `thousands` should work and it will infer the type correctly. But it may be necessary to pass `sep` and/or `quoting`. Of course it does not hurt to also explicitly specify `dtype`. If OP has given sample data we can be more certain. — buran, Feb 21 '22 at 20:04

score 0 · Answer 1 · answered Feb 21 '22 at 19:07

It looks like the Impressions column is stored as a string rather than as an int. So, it's sorting the digit representations in "alphabetical" order.

One solution is to change the values to int within the data frame. Another is to use a key for the sort_values call that converts the string to a number.

score 0 · Answer 2 · answered Feb 21 '22 at 19:10

0

The second option that @BenGrossmann said looks like this:

data.sort_values(by='Impressions', key=lambda col: int(col.replace(",","")))

answered Feb 21 '22 at 19:10

kpie

9,588
5
28
50

OP should read the data using `thousands` param of `read_csv` See the dup. – buran Feb 21 '22 at 19:11
yeah that's not a bad idea at all. – kpie Feb 21 '22 at 19:12
Or, read the CSV using the `dtype` param (of `read_csv`) as `dtype={'Impressions': int}`. Then, sort will work naturally without the need for a `lambda` hack. – S3DEV Feb 21 '22 at 19:40
@S3DEV, without `thousands=','` this will raise `ValueError: invalid literal for int() with base 10` – buran Feb 21 '22 at 19:46

Using .sort_values() in Python

2 Answers2