4

I'm trying to export a CSV string to a D3 web application, but the command to_csv insists on adding a trailing 0 to the data, which prevents a proper use by D3.

Here's a minimal example to illustrate the problem.

My (simplified) dataframe is:

>>> df = pd.DataFrame([['Alex',20.0000, 50.650]],columns=['Name','Age', 'Weight'])
   Name   Age  Weight
0  Alex  20.0   50.65

df['Age'] contains a float, as indicated by:

>>> df['Age']
0    20.0
Name: Age, dtype: float64

Then based on this answer I run .astype(object) to get the format I would like:

>>> df=df.astype(object)
   Name Age Weight
0  Alex  20  50.65

Now, df['Age'] contains an object, with no trailing zero:

>>> df['Age']
0    20
Name: Age, dtype: object

That's what I would like to export with to_csv, but this command reappends a trailing 0 to the number, which I want to avoid:

>>> df_csv = df.to_csv(sep=',', index = False)
>>> df_csv
'Name,Age,Weight\nAlex,20.0,50.65\n'

I tried using df_csv = df.to_csv(sep=',', index = False, float_format='%.0f') based on this answer, but that doesn't work because there are other floats in my dataframe for which I wish to keep non-zero decimals.

How could I prevent this trailing 0 to appear for numbers with no decimals?

sc28
  • 1,163
  • 4
  • 26
  • 48
  • 1
    I should point out that the answers seem to be misformatted, maybe because `d3.js` is one of the tags. I went ahead and fixed everyone's formatting. – xjcl Nov 10 '20 at 15:35

4 Answers4

9

Another way using Pandas replace :

df = df.astype(str)
df = df.replace(to_replace = "\.0+$",value = "", regex = True)

This way you don't need to import any extra module.

xjcl
  • 12,848
  • 6
  • 67
  • 89
cyrilb38
  • 924
  • 6
  • 17
1

Have you tried df['Age'] = df['Age'].astype(int)

this gets me

    Name    Age     Weight
0   Alex    20  50.65

converting a column to type object essentially allows the column to hold floats, ints, strings etc. vs a typed column which will only hold that type.

And converting it to a csv:

df_csv = df.to_csv(sep=',', index = False)
    
'Name,Age,Weight\r\nAlex,20,50.65\r\n'
xjcl
  • 12,848
  • 6
  • 67
  • 89
Mason Caiby
  • 1,846
  • 6
  • 17
  • Thanks, yes, I tried that, but I would need a "batch" approach, because there are hundreds of columns (some which may be added by a third party in the future). Therefore I would prefer to avoid hardcoding individual column names, which was the advantage of `.astype(obj)`. Do you know why the `object` type effectively removes the trailing 0 in the df, but not in the resulting csv? – sc28 May 28 '19 at 15:19
0

Here is one way that will work as long as you don't have spaces within your string columns.

Use to_string() after astype(object) instead of to_csv(). This will preserve the numeric formats, but it will use spaces as the separator. As long as you don't have spaces in any of your other fields, you can use regular expressions to convert the spaces to commas.

import re
df = df.astype(object)
df_string = re.sub(" +", ",", df.to_string(index=False))
print(df_string)
#Name,Age,Weight
#Alex,20,50.65

Now write df_string to your file:

with open('path/to/some/file.csv', 'w') as f:
    f.write(df_string)
xjcl
  • 12,848
  • 6
  • 67
  • 89
pault
  • 41,343
  • 15
  • 107
  • 149
0

I really hate this disprecancy between pandas.DataFrame.to_string and pandas.DataFrame.to_csv. However I salvaged the data by copying it into a new DataFrame using applymap:

df_fixed = df.applymap(lambda cell: int(cell) if str(cell).endswith('.0') else cell)
>>> df.to_csv()
',Name,Age,Weight\n0,Alex,20.0,50.65\n'

>>> df_fixed.to_csv()
',Name,Age,Weight\n0,Alex,20,50.65\n'

Note this doesn't work with large numbers (e.g. 10**7) because it'll start using e notation (1e7)

xjcl
  • 12,848
  • 6
  • 67
  • 89