2

I have this example dataset:

sub_id,age,country,score
{subID},{Age},{CountryOfOrigin},{Qscore}
1,23,UK,15
2,28,uk,19
3,40,United Kingdom,33
4,19,france,21
5,36,Italy,16
6,24,UK,18
7,26,greece,16
8,22,italy,15

I'd like to read this in and perform some computations/analyses. I want the header row, but the row causes problems. I tried reading it in and dropping the first row ({with these}... it's a nonsense row), but because of the mixed datatypes within the column when it was read in, R won't let me perform the computations on anything because the data are not numeric anymore.

This is an example of a much larger dataframe, so I can't do that and manually specify the columns that I want to change to numeric.

It seems like the best solution would be to read the csv file in, with the header, but skip the first row.

df <- read.csv('scores.csv',
               header=TRUE,
               skip=1)

This works, but it converts all of my column names! For example df$Qscore becomes df$X.Qscore., which is obviously not ideal. I can at least perform the computations on that, but I don't know what I'm doing wrong.

I also tried reading in just the headers and then the data without the headers, and sticking them together but there were lots of issues with that too. This has to be such a common issue...

Note: I'm new to R and I have an issue that seems like it would be very common, but I'm unable to find the answer on here (probably because I don't know what to search for?), so apologies if this is a massive duplicate...

zx8754
  • 52,746
  • 12
  • 114
  • 209
fffrost
  • 1,659
  • 1
  • 21
  • 36
  • You could use `readLines` to read in the file as a character vector, then delete the second element of the vector, then use `textConnection` to pass the vector to `read.csv` – Michael Bird Aug 10 '18 at 08:51
  • @SaurabhChauhan I did see this, and I wanted to know if there was a more straightforward solution. The top comment there does work, although the other one (just trimming out the second row after the read.csv() call) doesn't work because the column datatypes are bad. Is this readLines & textConnection really the best solution? It seems like such a common issue that we shouldn't have to go to this much effort to achieve this. – fffrost Aug 10 '18 at 09:02

1 Answers1

3

We could abuse comment.char option:

read.table(text = "sub_id,age,country,score
{subID},{Age},{CountryOfOrigin},{Qscore}
1,23,UK,15
2,28,uk,19
3,40,United Kingdom,33
4,19,france,21
5,36,Italy,16
6,24,UK,18
7,26,greece,16
8,22,italy,15", sep = ",", comment.char = "{", header = TRUE)
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • Nice one, would never remind myself of that. Note that `read.csv` also works and doesn't need `sep` nor `header`. – Rui Barradas Aug 10 '18 at 09:01
  • @RuiBarradas True, I am just used to more generic `read.table` function with more options. `read.csv` is just a pretty wrapper for `read.table` anyway. – zx8754 Aug 10 '18 at 09:04
  • In my real dataset (not the one in the example) there are more issues - it isn't just the "{" but some other things too and I don't want to have to account for each individually, rather just a more reliable way of dealing with it. Thanks anyway – fffrost Aug 10 '18 at 09:04
  • 1
    @fffrost then you should post example of your data that is representative or your real data. In any case, the linked post should deal with any file. – zx8754 Aug 10 '18 at 09:05