0

I am probably missing something obvious and asking a silly thing, but I am unable to do a simple sum.

My data was imported with the character '€' character so I had to import the data as text:

original data sample:
    "€31.51"
    "€0.10"
    "€24.23"

I tried to use a string function to remove the €. I was then hoping to convert to numeric and sum.

SELECT sum(coalesce(CAST(split_part(revenue_eur, '€', 2) as NUMERIC),'0'))
FROM revenue_test;

the only piece that runs is

SELECT coalesce(split_part(revenue_eur, '€', 2),'0')
FROM revenue_test;

What I need is just a sum. Could someone please kindly help me figure it out? I tried doing a subquery but failed in misery..

Maybe there is a way to import the data without the € and into numeric?

Thank you!!!!!!

EDIT: I imported via CSV to pgAdmin4 and using postgres 12 (The file counts to 85k rows in sql) To import the data I tried COPY with the pgAdmin4 query tool but I got error 'permission denied'. I checked all the permissions to my file but I clearly was missing something, the most likely solution I found was that I needed to connect to postgres via the terminal on my mac and use \COPY. But I didn't manage to do that. So I ended up using the right click feature 'import' via pgAdmin.

EDIT 2: I found the problem, when importing a ',' was inserted to one value (mark thousands) so I am unable to cast without taking it away. I found regular expression help on removing a character from a specific order but the , appears randomly. Code works!

WITH test AS
(SELECT translate(revenue_eur, '€,', '')::float as eur
FROM revenue_test)

SELECT sum(coalesce(eur, '0'))
from test;
christinelly
  • 63
  • 1
  • 7
  • You're trying to remove a character, not split. In any case, the best time to clean the data is while loading it, not afterwards. Where was the data imported from? A CSV? Excel? How was it imported? Did the importing program convert an Excel money value to text while importing perhaps? – Panagiotis Kanavos Nov 02 '21 at 18:19
  • You could use `LTRIM(revenue_eur,'€')` to remove the leading currency sign, or `SUBSTRING(revenue_eur,2)` to read from the second character onwards – Panagiotis Kanavos Nov 02 '21 at 18:22
  • Did you use `COPY` to load the data from a file? Or a program to load the data from Excel? How much data is there? – Panagiotis Kanavos Nov 02 '21 at 18:28
  • @PanagiotisKanavos sorry, I didn't mention. I imported via CSV to pgadmin4. I tried COPY with the Pgadmin query tool but I got error 'permission denied' so I right clicked on the table and used the 'import' feature. (I didn't know how to connect via postgresql via the terminal on my mac, which probably would have solved the permission error, I checked all my permissions on the file) There is about 85k rows. – christinelly Nov 02 '21 at 18:35
  • 85K rows isn't all that much. CSV is text which means you can just open the file in an editor and replace € with an empty string. – Panagiotis Kanavos Nov 03 '21 at 09:15
  • You can use a Python script and [Pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/02_read_write.html) to load the entire file in memory with [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) in memory, [remove the sign in the entire column](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/10_text_data.html) and convert it to a number before [saving to the DB with to_sql](https://stackoverflow.com/questions/23103962/how-to-write-dataframe-to-postgres-table). Python 3 is already installed – Panagiotis Kanavos Nov 03 '21 at 09:17
  • `the , appears randomly.` is it really random ? In a CSV file a field terminator like `,` can appear inside a field as long as the field is quoted. `"John Doe, Mr", €,1000` is a valid row with 3 fields, the first containing `John Doe, Mr` – Panagiotis Kanavos Nov 03 '21 at 09:20
  • @PanagiotisKanavos yes it's random. It seems it was formatted to mark the thousand separator (eg: €1,000). I could use an editor but I want to know how to do it regardless of size as I will have larger data sets in the future with the same issue. I also just want to learn how to do it properly! I will have a look at your python suggestion (I just started learning python so I'll see how I manage, it seems like a great solution:)) – christinelly Nov 03 '21 at 09:37
  • What does the line actually look like? `Blah,"€1,000"` is valid. `Blah,€1,000` is not and whoever generated that file should fix their code. Perhaps there's an option to quote fields. Or perhaps the field separator is `;`, not `,`. In most European countries `,` is the *decimal* separator so `;` is used as a field separator, eg `Blah;1000,34;Blah` – Panagiotis Kanavos Nov 03 '21 at 09:38
  • @PanagiotisKanavos The line is: 2020-12-01,a0a55f88-0209-48b8-bb5b-a8d054e4f895,"€2,389.67" – christinelly Nov 03 '21 at 09:43
  • So it *is* a valid line. – Panagiotis Kanavos Nov 03 '21 at 09:44
  • @PanagiotisKanavos I might have misunderstood this but my reasoning is that because of the € I need to import as text, postgres doesn't allow me to import in numeric, So the , gets stored as a text, if there was no € and it was imported as numeric the , would have been valid. – christinelly Nov 03 '21 at 09:51
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/238827/discussion-between-christinelly-and-panagiotis-kanavos). – christinelly Nov 03 '21 at 10:16

0 Answers0