3

I'm reading data into an R data frame by using read.csv2. With one of my data sets, somehow a few of the variables are put into the next line, creating extra lines and cutting columns. For illustration: My CSV looks like this:

var1,var2,var3
value1,value2,value3
value1,value2,value3

The data frame, however, turns out like this:

var1    var2
value1  value2
value3
value1  value2
value3

I have used the same command on a lot of CSV-files, even on one with a different sample of the exact same file, but never got this problem. Is there anyone with an idea of what could cause this?

Edit: as I am still not sure how to upload an actual data set, I have uploaded a screenshot of how it looks like. The splits occur in every line with more than 3 columns. enter image description here

weissAa
  • 33
  • 5
  • Can you provide a reprduceable example dataset? And also your desired output? – TobKel Feb 10 '20 at 12:57
  • The pattern seems to be incomplete or wrong. `var3` does not appear in the dataframe. – Georgery Feb 10 '20 at 13:10
  • there is probably issue with your csv file, some additional `\n` or something like that – jyr Feb 10 '20 at 13:16
  • I have found that excel can add lots of extra , (commas) at the end of csv file lines and this can break other programs trying to read the data. I suggest you check your csv files with a programmers text editor. – Nigel Davies Feb 10 '20 at 13:53
  • @TobKel: can you tell me how to upload such an example dataset? – weissAa Feb 12 '20 at 09:28
  • @jyr: how can I check that? – weissAa Feb 12 '20 at 09:29
  • @NigelDavies: I have checked the file in Visual Studio Code, but there are no additional commas at the end of the lines... – weissAa Feb 12 '20 at 09:32
  • Commas at the end of the line would have a different result, and error "more columns than expected". Try to pin point the line where this split happens and either post it here or try to manually find it yourself. Furthermore you can try to open the file in Excel, Libreoffice and see if there is similar issue. – jyr Feb 12 '20 at 09:38
  • @weissAa you can copy your dataset as a normal code with the `dput`-function. (for example `dput(dataframe)`) – TobKel Feb 12 '20 at 09:42
  • @TobKel This is probably not going to help as issue is with the data before loading, because they are already corrupted after. – jyr Feb 12 '20 at 09:50
  • @TobKel, jyr is right. I am having trouble reading the csv-file correctly into R – weissAa Feb 13 '20 at 13:17
  • 1
    @jyr, as I am still not sure on how I can upload a data set here, I have attached a screenshot of what the data set looks like in the original post. The splits occur in every line with more than 3 columns. – weissAa Feb 13 '20 at 13:21
  • @weissAa I have added answer, hopefully it will fix your issue. – jyr Feb 13 '20 at 14:54

1 Answers1

1

In your data you have unequal number of columns across the file, thus you cannot simply use read.csv and have to use read.table with fill=TRUE. Fill the col.name with appropriate number of column names that it reflects your data structure. From your screenshot there is at least 6 columns.

read.table(dat, header = FALSE, sep = ",", 
      col.names = c("col_name1", "col_name2", "col_name3"), fill = TRUE)

For more information see this answer.

jyr
  • 690
  • 6
  • 20