2

I've a data frame with some metadata in the first 3 rows, that I need to skip. But doing so, also affects the colnames of the values cols.

What can I do, to avoid opening every CSV on excel and deleting these rows manually?

This is how the CSV looks when opened in Excel:

enter image description here

In R, I'm using this command to open it:

android_per <- fread("...\\Todas las adquisiciones de dispositivos de Versión de Android PE.csv",
                     skip = 3)

And it looks like this:

enter image description here

UPDATE 1:

enter image description here

G5W
  • 36,531
  • 10
  • 47
  • 80
Omar Gonzales
  • 3,806
  • 10
  • 56
  • 120
  • Some useful previous answers for dealing with this sort of issue here - https://stackoverflow.com/questions/17797840/reading-two-line-headers-in-r/17798282 – thelatemail May 14 '20 at 00:42

4 Answers4

4

Similar logic to @G5W, but I think there needs to be a step of squashing the header that is in 2 rows back to one. E.g.:

txt <- "Some, utter, rubbish,,
Even more rubbish,,,,
,,Col_3,Col_4,Col_5
Col_1,Col_2,,,
1,2,3,4,5
6,7,8,9,0"
## below line writes a file - uncomment if you're happy to do so
##cat(txt, file="testfile.csv", "\n")

header <- apply(read.csv("testfile.csv", nrows=2, skip=2, header=FALSE), 
                2, paste, collapse="")
read.csv("testfile.csv", skip=4, col.names=header, header=FALSE)

Output:

#  Col_1 Col_2 Col_3 Col_4 Col_5
#1     1     2     3     4     5
#2     6     7     8     9     0
thelatemail
  • 91,185
  • 12
  • 128
  • 188
2

Here is one way to do it. Read the file simply as lines of text. Eliminate the lines that you don't want, then read the remaining good part into a data.frame.

Sample csv file (I saved it as "Temp/Temp.csv")

Col_1,Col_2,Col_3,Col_4,Col_5
Some utter rubbish,,,,
Presumably documentation,,,,
1,2,3,4,5
6,7,8,9,0

Code

CSV_Lines = readLines("temp/Temp.csv")
CSV_Lines = CSV_Lines[-(2:3)]
DF = read.csv(text=CSV_Lines)
  Col_1 Col_2 Col_3 Col_4 Col_5
1     1     2     3     4     5
2     6     7     8     9     0

It skipped the unwanted lines and got the column names.

G5W
  • 36,531
  • 10
  • 47
  • 80
1

If you use skip = 3, you definitely lose the column names without an option to get it back using R. An ugly hack could be to use skip = 2 which will make sure that all other columns except the first 2 are correct.

df <- read.table('csv_name.csv', skip = 2, header = TRUE)

The headers of the first 2 columns are in the first row so you can do

names(df)[1:2] <- df[1, 1:2]

Probably, you need to shift all the rows 1 step up to get dataframe as intended.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I wouldn't encourage this - it ends up breaking the guessed format of the columns by adding a character at the top. Better to suck in the data and the data only and then grab the names separately to overwrite later. – thelatemail May 14 '20 at 00:40
  • How do we grab only the names separately when they are in different rows? – Ronak Shah May 14 '20 at 00:54
  • Grab the two rows with `readLines` and then join them together - `pmax`? `paste` - something like that. I'd have a crack but i can't be bothered without a reproducible example. – thelatemail May 14 '20 at 00:57
0

In case you put Header as false then you can use below code:

df<-fread("~/Book1.csv", header = F, skip = 2)

shift_up <- function(x, n){
             c(x[-(seq(n))], rep(NA, n))
             }

df[1,1]<-df[2,1]
df[1,2]<-df[2,2]
df<-df[-2,]
names(df)<-as.character(df[1,])
df<-df[-1,]
Harshal Gajare
  • 605
  • 4
  • 16