2

I got this message after I convert a few columns from "characters" to "numeric": Warning message: Unknown or uninitialised column:df

I needed to load a csv file (from Qualtrics) into R.

filename <- "/Users/Study1.csv"
library(readr)
df <- read_csv(filename)

The first row contains the variable names, but the second and the third rows are a chunk of characters not useful for R. Therefore, I needed to remove those two rows. However, since R already recognised rows 18 to the end to be characters thanks to those useless chunks of strings, I needed to convert these rows manually to numeric (which is necessary for me to do further analysis).

# The 2nd and 3rd rows of the csv file are useless (they are strings)
df <- df[3:nrow(df), ]
# cols 18 to the end are supposed to be numeric, but the 2nd and 3rd rows are string, so R thinks that these columns contain strings
df[ ,18:ncol(df)] <- lapply(df[ ,18:ncol(df)], as.numeric)

After running the above code, the error popped up:

Warning message:
Unknown or uninitialised column: 'df'. 
Parsed with column specification:
cols(
  .default = col_character()
)
See spec(...) for full column specifications.
NAs introduced by coercionNAs introduced by coercion

The NAs are fine. But the error message is annoying. Is there a better way to convert my columns to numeric? Thank you all!

EDITED Thank you all for your advice. I tried the method of skiping the 2nd and the 3rd rows. However, one peculiar thing happened. Because on cell contains multiple rows, separate by empty lines, R recognised it incorrectly. enter image description here I blurred the original text in the picture. It happens whether or not I clicked ""First Row as Names". Can you suggest any fix to it? Thanks all again.

UPDATE on 2018-05-30: I've solved the problem. Please see my answer below or visit How to import Qualtrics data (in csv format) into R

JetLag
  • 296
  • 1
  • 4
  • 16
  • 2
    Options: a. Fix the CSV; b. Use `skip = 3` and supply column names; c. Use `type_convert` after the fact. – alistaire Oct 31 '17 at 01:30
  • 1
    You could `readLines` the text in, drop the parts, then feed to the read function of your choice - `read.table(text=readLines("textfile")[-(2:3)], header=TRUE)` for instance. – thelatemail Oct 31 '17 at 01:46

2 Answers2

0

You can specify the column types in readr::read_csv

df <- readr::read_csv(file_name, col_types = "c")

from ?readr::read_csv

Alternatively, you can use a compact string representation where each character represents one column: c = character, i = integer, n = number, d = double, l = logical, D = date, T = date time, t = time, ? = guess, or _/- to skip the column.

working example

df <- readr::read_csv("  ,    ,      
                         ,    ,      
                      idx, key, value
                         ,    ,  
                        1, foo,   196
                        2, bar,   691",
                      skip = 2,
                      col_names = TRUE,
                      col_types = "ncd")

df <- dplyr::slice(df, 2:n())

df
# # A tibble: 2 x 3
#   idx   key value
# <dbl> <chr> <dbl>
# 1   1   foo   196
# 2   2   bar   691

This assumes the number of rows between the header and data is consistent, if this is subject to change then it will require a different strategy.

Kevin Arseneau
  • 6,186
  • 1
  • 21
  • 40
0

Thank you all for your advice and comments. I heeded @alistaire 's advice of using skip.

As per the newline in the qualtrics cell, I found that I could click on "More options" when exporting data, and select "remove line breaks".

Following the advice from Skip specific rows using read.csv in R, I used the following code to solve my problem.

headers = read.csv(filename, header = F, nrows = 1, as.is = T)
df = read.csv(filename, skip = 3, header = F)
colnames(df)= headers
JetLag
  • 296
  • 1
  • 4
  • 16