13

The dataset I want to read in contains numbers with and without a comma as thousand separator:

"Sudan", "15,276,000", "14,098,000", "13,509,000"
"Chad", 209000, 196000, 190000

and I am looking for a way to read this data in.

Any hint appreciated!

zx8754
  • 52,746
  • 12
  • 114
  • 209
Karsten W.
  • 17,826
  • 11
  • 69
  • 103

4 Answers4

20

since there is an "r" tag under the question, I assume this is an R question. In R, you do not need to do anything to handle the quoted commas:

> read.csv('t.csv', header=F)
     V1          V2          V3          V4
1 Sudan  15,276,000  14,098,000  13,509,000
2  Chad      209000      196000      190000

# if you want to convert them to numbers:
> df <- read.csv('t.csv', header=F, stringsAsFactor=F)
> df$V2 <- as.numeric(gsub(',', '', df$V2))
xiechao
  • 2,291
  • 17
  • 11
  • 8
    I'd love it if read.csv (and read.table at root) took a 'thousands.sep' argument as a character to allow (and strip) in numeric data. For now I think the gsub() solution is all we have though. – Ken Williams Mar 02 '10 at 17:50
1

Looking at that set of data you could parse it using ", " (note the extra space) as the seperator intead of ","

Mark Pope
  • 11,244
  • 10
  • 49
  • 59
0

You could use the following regular expression to remove the commas and any surrounding quote marks to leave plain csv content

,(?=[0-9])|"

then process it as normal

Justin Wignall
  • 3,490
  • 20
  • 23
-4

How about doing it as a two step process. 1. Replace the "," with a TAB character 2. Split on tab.

I'm assuming .NET here but the sample principle would apply in any language

Raj
  • 1,742
  • 1
  • 12
  • 17
  • A couple comments - 1) the "r" tag means Karsten is using the "R" language, not .NET. 2) Replacing all commas with tabs wouldn't work, you'd end up splitting your data in bad ways. – Ken Williams Mar 02 '10 at 17:40