4

I need to add double quotes to specific columns in a csv file that my script generates.

Below is the goofy way I thought of doing this. For these two fixed-width fields, it works:

df['DATE'] = df['DATE'].str.ljust(9,'"')
df['DATE'] = df['DATE'].str.rjust(10,'"')

df['DEPT CODE'] = df['DEPT CODE'].str.ljust(15,'"')
df[DEPT CODE'] = df['DEPT CODE'].str.rjust(16,'"')

For the following field, it doesn't. It has a variable length. So, if the value is shorter than the standard 6-digits, I get extra double-quotes: "5673"""

df['ID'] = df['ID'].str.ljust(7,'"')
df['ID'] = df['ID'].str.rjust(8,'"')

I have tried zfill, but the data in the column is a series-- I get "pandas.core.series.Series" when i run

print type(df['ID'])

and I have not been able to convert it to string using astype. I'm not sure why. I have not imported numpy.

I tried using len() to get the length of the ID number and pass it to str.ljust and str.rjust as its first argument, but I think it got hung up on the data not being a string.

Is there a simpler way to apply double-quotes as I need, or is the zfill going to be the way to go?

mattrweaver
  • 729
  • 4
  • 14
  • 36

2 Answers2

8

You can add a speech mark before / after:

In [11]: df = pd.DataFrame([["a"]], columns=["A"])

In [12]: df
Out[12]:
   A
0  a

In [13]: '"' + df['A'] + '"'
Out[13]:
0    "a"
Name: A, dtype: object

Assigning this back:

In [14]: df['A'] = '"' + df.A + '"'

In [15]: df
Out[15]:
     A
0  "a"

If it's for exporting to csv you can use the quoting kwarg:

In [21]: df = pd.DataFrame([["a"]], columns=["A"])

In [22]: df.to_csv()
Out[22]: ',A\n0,a\n'

In [23]: df.to_csv(quoting=1)
Out[23]: '"","A"\n"0","a"\n'
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • I got an error "TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S21') dtype('S21') dtype('S21')" – mattrweaver Nov 06 '15 at 22:56
  • I got that error when I tried -- df['A'] = '"' + df.A + '"' – mattrweaver Nov 06 '15 at 23:04
  • 1
    Pandas doesn't use s21, it should be using object, is that numpy or an old version of pandas? – Andy Hayden Nov 06 '15 at 23:34
  • my pandas is 16.2 and I haven't imported numpy into the script – mattrweaver Nov 07 '15 at 00:16
  • 1
    @mattrweaver There's definitely something fishy going on, pandas should never use s21 (if you're reading csv, then it's [avoided](http://wesmckinney.com/blog/a-new-high-performance-memory-efficient-file-parser-engine-for-pandas/)... How are you importing your data / are you running it through a library that returns a numpy array? – Andy Hayden Nov 07 '15 at 00:28
  • the data comes in an email and its html that I have to copy and paste to excel and save as a csv file. That's the only way I can get the data. Then, I import that csv file using pd.read_csv – mattrweaver Nov 07 '15 at 00:30
  • @mattrweaver Do you think you could post some data somewhere or an example which which we can reproduce this? Very strange! – Andy Hayden Nov 07 '15 at 00:35
  • the line df['ID'] = '"' + df.ID + '"' gives me the error TypeError: cannot concatenate 'str' and 'numpy.ndarray' objects. The other two lines work just fine. I don't know how this could be a numpy array when numpy isn't imported in the script. I have tried this on a second python install and that data is the problem. As I am dealing with confidential data, I can't post it anywhere. I don't know why that column is a numpy array while the others aren't. – mattrweaver Nov 09 '15 at 01:33
  • @mattrweaver one way to try and debug/fix this is to read in the first half of the lines, see if it's there, otherwise look at the second half, and repeat til you find an offending short example - hopefully a single line which you can then anonymize and post here. :) – Andy Hayden Nov 09 '15 at 05:21
  • I wasn't sure if by creating a new csv file from scratch, I might eliminate something related to my problem if it came from the conversion from HTML. But, with the new csv file, I get the same behavior. Here is the link to that csv file: https://dl.dropboxusercontent.com/u/7775706/testdata.csv – mattrweaver Nov 09 '15 at 12:57
  • @mattrweaver on which column, I can't replicate this I don't think. – Andy Hayden Nov 09 '15 at 15:08
  • the ID column is the problem. – mattrweaver Nov 09 '15 at 15:11
  • here is a link to the code. You will see I have tried to deal with the quotes by reopening the file using csv and writing them in there -- getting triple quotes https://dl.dropboxusercontent.com/u/7775706/csvstack.py – mattrweaver Nov 09 '15 at 15:16
  • i marked your answer as the correct one because it worked on two of the three columns. THere is clearly something wrong with what I've done. I fixed the triple quotes, btw. :-) – mattrweaver Nov 09 '15 at 15:27
  • 1
    @mattrweaver Great!! Ah ha, you need to convert numeric columns to string before you can add strings (potentially that's only python 3?) http://stackoverflow.com/q/22005911/1240268 so `'"' + df.ID.astype(str) + '"'` – Andy Hayden Nov 09 '15 at 15:31
  • That got it! Dang, I had been trying to use astype(str) in a different construct on that column and was getting errors. Clearly, I wasn't going about it the right way. Thanks for your help. – mattrweaver Nov 09 '15 at 15:39
1

With numpy, not pandas, you can specify the formatting method when saving to a csv file. As very simple example:

In [209]: np.savetxt('test.txt',['string'],fmt='%r')
In [210]: cat test.txt
'string'

In [211]: np.savetxt('test.txt',['string'],fmt='"%s"')
In [212]: cat test.txt
"string"

I would expect the pandas csv writer to have a similar degree of control, if not more.

hpaulj
  • 221,503
  • 14
  • 230
  • 353