Skipping rows gets rid off necessary colnames?

Question

I've a data frame with some metadata in the first 3 rows, that I need to skip. But doing so, also affects the colnames of the values cols.

What can I do, to avoid opening every CSV on excel and deleting these rows manually?

This is how the CSV looks when opened in Excel:

In R, I'm using this command to open it:

android_per <- fread("...\\Todas las adquisiciones de dispositivos de Versión de Android PE.csv",
                     skip = 3)

And it looks like this:

UPDATE 1:

Some useful previous answers for dealing with this sort of issue here - https://stackoverflow.com/questions/17797840/reading-two-line-headers-in-r/17798282 — thelatemail, May 14 '20 at 00:42

score 4 · Answer 1 · answered May 14 '20 at 01:16

Similar logic to @G5W, but I think there needs to be a step of squashing the header that is in 2 rows back to one. E.g.:

txt <- "Some, utter, rubbish,,
Even more rubbish,,,,
,,Col_3,Col_4,Col_5
Col_1,Col_2,,,
1,2,3,4,5
6,7,8,9,0"
## below line writes a file - uncomment if you're happy to do so
##cat(txt, file="testfile.csv", "\n")

header <- apply(read.csv("testfile.csv", nrows=2, skip=2, header=FALSE), 
                2, paste, collapse="")
read.csv("testfile.csv", skip=4, col.names=header, header=FALSE)

Output:

#  Col_1 Col_2 Col_3 Col_4 Col_5
#1     1     2     3     4     5
#2     6     7     8     9     0

score 2 · Answer 2 · answered May 14 '20 at 00:23

Here is one way to do it. Read the file simply as lines of text. Eliminate the lines that you don't want, then read the remaining good part into a data.frame.

Sample csv file (I saved it as "Temp/Temp.csv")

Col_1,Col_2,Col_3,Col_4,Col_5
Some utter rubbish,,,,
Presumably documentation,,,,
1,2,3,4,5
6,7,8,9,0

Code

CSV_Lines = readLines("temp/Temp.csv")
CSV_Lines = CSV_Lines[-(2:3)]
DF = read.csv(text=CSV_Lines)
  Col_1 Col_2 Col_3 Col_4 Col_5
1     1     2     3     4     5
2     6     7     8     9     0

It skipped the unwanted lines and got the column names.

score 1 · Answer 3 · answered May 14 '20 at 00:10

1

If you use skip = 3, you definitely lose the column names without an option to get it back using R. An ugly hack could be to use skip = 2 which will make sure that all other columns except the first 2 are correct.

df <- read.table('csv_name.csv', skip = 2, header = TRUE)

The headers of the first 2 columns are in the first row so you can do

names(df)[1:2] <- df[1, 1:2]

Probably, you need to shift all the rows 1 step up to get dataframe as intended.

answered May 14 '20 at 00:10

Ronak Shah

377,200
20
156
213

I wouldn't encourage this - it ends up breaking the guessed format of the columns by adding a character at the top. Better to suck in the data and the data only and then grab the names separately to overwrite later. – thelatemail May 14 '20 at 00:40
How do we grab only the names separately when they are in different rows? – Ronak Shah May 14 '20 at 00:54
Grab the two rows with `readLines` and then join them together - `pmax`? `paste` - something like that. I'd have a crack but i can't be bothered without a reproducible example. – thelatemail May 14 '20 at 00:57

score 0 · Answer 4 · answered May 14 '20 at 02:58

In case you put Header as false then you can use below code:

df<-fread("~/Book1.csv", header = F, skip = 2)

shift_up <- function(x, n){
             c(x[-(seq(n))], rep(NA, n))
             }

df[1,1]<-df[2,1]
df[1,2]<-df[2,2]
df<-df[-2,]
names(df)<-as.character(df[1,])
df<-df[-1,]

Skipping rows gets rid off necessary colnames?

4 Answers4