3

I have been importing data from a Pandas Dataframe to a postgres DB using my own custom import script. Unfortunately, my data is not tidy which caused every single one of my columns to be parsed as a text. Is there any way where I could get the entries where a certain column value is NOT a number? My plan is to delete those records and convert the column to a numeric type

Thanks!

Macterror
  • 431
  • 6
  • 20

1 Answers1

9

get the records which have no digits, then you would have to use the following regex:

DELETE FROM myrecords WHERE record ~ '^[^0-9]+$';

Here, the ^ character outside of square brackets means the beginning of the field, the $ character means the end of the field, and we require that all characters in between are non-digits. + indicates that there should be at least one such characters. If we would also allow empty strings, then the regex would look like ^[^0-9]*$.

If you want the records which would include digits and lower-case letter, then I would expect a regex like:

DELETE FROM myrecords WHERE record ~ '[0-9a-z]';
AnouarZ
  • 1,087
  • 8
  • 23
  • the first regex should be `'^[0-9]+$` or you could shorten it to `'^\d+$'` – Haleemur Ali Nov 09 '17 at 18:39
  • 1
    @HaleemurAli yes that's work as well `\d is a metacharacter that matches any digit, which is identical to [0-9]` @Macterror you are welcome, glad it helps – AnouarZ Nov 09 '17 at 18:52
  • A single non-numeric character suffices to determine that a string is not a number. Therefore it would seem more appropriate to use `[^0-9]+` (or `[^\d]+`) — that is, check if at least one character is not a number. – Skippy le Grand Gourou Jul 21 '20 at 16:19
  • @HaleemurAli NO ! With your regex the query would delete all (and only) actual numbers… – Skippy le Grand Gourou Jul 21 '20 at 16:21