0

My CSV consists of rows with unequal number of items. Eg.

Customer1, 01XX, 02XY, 05XYZ, 100XYZ, 03X23
Customer2, 02XX, 012X, 05XYZ
Customer3, 01XX, 02XY, 05XYZ, 012X, 005XZZ, 100XYZ

etc.

How to read such data from csv into an R object so that it could be referenced by the first item in CSV row, Customer_x, so that it can be used similar to df, be parametrised, something like:

myVector <- myCSVdata[CustomerID]
myVector
[1] 02XX, 012X, 05XYZ

EDIT: why minuses? How to correct? Maybe there is a more effective way than read csv when rows are unequal. Maybe a list of vectors? I am not sure.

EDIT2: I am trying at self-moderation: found a question worded differently for similar thing and an answer that generates a list of vectors, will experiment with that too: https://stackoverflow.com/a/18922750/3480717

Jacek Kotowski
  • 620
  • 16
  • 49

2 Answers2

4

I like using data.table::fread to read txt/csv files.

I copied and pasted your example data to csv file and Imported it. Notice that I set header = FALSE, as it seems that you don't have header.

df <- data.table::fread("Test.csv", header = FALSE)
df

Results

          V1   V2   V3    V4     V5     V6     V7
1: Customer1 01XX 02XY 05XYZ 100XYZ  03X23       
2: Customer2 02XX 012X 05XYZ                     
3: Customer3 01XX 02XY 05XYZ   012X 005XZZ 100XYZ
DJV
  • 4,743
  • 3
  • 19
  • 34
1

In order to read in the data, you need to set the appropriate read.table arguments' values.

myCSVdata <- read.table(text = "
Customer1, 01XX, 02XY, 05XYZ, 100XYZ, 03X23
Customer2, 02XX, 012X, 05XYZ
Customer3, 01XX, 02XY, 05XYZ, 012X, 005XZZ, 100XYZ
", header = FALSE, sep = ",", fill = TRUE, stringsAsFactors = FALSE)

row.names(myCSVdata) <- myCSVdata[[1]]
myCSVdata[1] <- NULL
myCSVdata
#             V2    V3     V4      V5      V6      V7
#Customer1  01XX  02XY  05XYZ  100XYZ   03X23        
#Customer2  02XX  012X  05XYZ                        
#Customer3  01XX  02XY  05XYZ    012X  005XZZ  100XYZ

Then, it is just a matter of subsetting data frames.

myVector <- unlist(myCSVdata["Customer2", ])
myVector
#      V2       V3       V4       V5       V6       V7 
# " 02XX"  " 012X" " 05XYZ"       ""       ""       ""

unname(myVector)
#[1] " 02XX"  " 012X"  " 05XYZ" ""       ""       ""
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66