5

This may be a stupid question, but for the life of me I can't figure out how to get Julia to read a csv file with column names that start with numbers and use them in DataFrames. How does one do this?

For example, say I have the file "test.csv" which contains the following:

,1Y,2Y,3Y
1Y,11,12,13
2Y,21,22,23

If I just use readtable(), I get this:

julia> using DataFrames

julia> df = readtable("test.csv")
2x4 DataFrames.DataFrame
| Row | x    | x1Y | x2Y | x3Y |
|-----|------|-----|-----|-----|
| 1   | "1Y" | 11  | 12  | 13  |
| 2   | "2Y" | 21  | 22  | 23  |

What gives? How can I get the column names to be what they're supposed to be, "1Y, "2Y, etc.?

Robert Mah
  • 551
  • 4
  • 7

3 Answers3

4

The problem is that in DataFrames, column names are symbols, which aren't meant to (see comment below) start with a number.

You can see this by doing e.g. typeof(:2), which will return Int64, rather than (as you might expect) Symbol. Thus, to get your columnnames into a useable format, DataFrames will have to prefix it with a letter - typeof(:x2) will return Symbol, and is therefore a valid column name.

Community
  • 1
  • 1
Nils Gudat
  • 13,222
  • 3
  • 39
  • 60
3

Unfortunately, you can't use numbers for starting names in DataFrames.

The code that does the parsing of names makes sure that this restriction stays like this.

I believe this is because of how parsing takes place in julia: :aa names a symbol, while :2aa is a value (makes more sense considering 1:2aa is a range)

Felipe Lema
  • 2,700
  • 12
  • 19
2

You could just use rename!() after the import:

df = csv"""
,1Y,2Y,3Y
1Y,11,12,13
2Y,21,22,23
"""
rename!(df, Dict(:x1Y =>Symbol("1Y"), :x2Y=>Symbol("2Y"), :x3Y=>Symbol("3Y")  ))

2×4 DataFrames.DataFrame
│ Row │ x    │ 1Y │ 2Y │ 3Y │
├─────┼──────┼────┼────┼────┤
│ 1   │ "1Y" │ 11 │ 12 │ 13 │
│ 2   │ "2Y" │ 21 │ 22 │ 23 │

Still you may experience problems later in your code, better to avoid column names starting with numbers...

Antonello
  • 6,092
  • 3
  • 31
  • 56