4

I'm a beginner with R. I hope you can help me witch my question. There are filenames in my dataset with alot of information. I have to extract this information to create seperate variables.

To begin I use

splits <- t(as.data.frame(strsplit(as.character(rawdata_r$File),"_")))

But when I use it I get this Error:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : Arguments imply different number of rows: 1, 4, 5, 2

What could be the problem? Thank you for the help in advance.

De Novo
  • 7,120
  • 1
  • 23
  • 39
RenaSo
  • 159
  • 2
  • 2
  • 6
  • Welcome to stack overflow! You can edit your question with your clarifications. That's the best way to provide new information on your answer. – De Novo Mar 28 '18 at 18:17
  • What is `rawdata_r$File` ? Specifically, what is the output of `class(rawdata_r$File)` and `length(rawdata_r$File)`? – De Novo Mar 28 '18 at 18:18
  • It's the column of the dataset with the names of the datafiles. There is some information in the names (like date, person number, etc.). It's a big File with over 23000 entries. `class(rawdata_r$File)` says "factor". – RenaSo Mar 28 '18 at 18:32
  • @RenaSo Do you think all `rows` have same number of items for `rawdata_r$File` which are separated by `_`? May be its work sharing at least 1 row data of `rawdata_r$File`. – MKR Mar 28 '18 at 18:41

2 Answers2

2

Your error was thrown by the as.data.frame() function. Data frames in R have to have columns with the same number of rows.

Given the error message: strsplit(as.character(rawdata_r$File),"_") has produced a list with 1, 4, 5, and 2 nested elements. This suggests that rawdata_r$File is a factor, that you're converting to character. The length of the character vector is 4, and the elements have 0, 3, 4, and 1 "_" in them respectively. Perhaps these are words in snake_case

Depending on what you want to use this object for, I would suggest just removing the call to data.frame, and the call to t. If you want to convert filenames using a snake_case naming convention to their words

See the following example:

# create an object with similar characteristics
filenames <- factor(c("foo", "foo_bar_baz_fiz", "foo_bar_baz_fiz_buz", "hello_world"))

# generate the error:
splits <- t(as.data.frame(strsplit(as.character(filenames),"_")))

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 4, 5, 2

# don't generate the error
splits <- strsplit(as.character(filenames), "_")
splits
[[1]]
[1] "foo"

[[2]]
[1] "foo" "bar" "baz" "fiz"

[[3]]
[1] "foo" "bar" "baz" "fiz" "buz"

[[4]]
[1] "hello" "world"
De Novo
  • 7,120
  • 1
  • 23
  • 39
0

In case File column from OP dataframe got fixed number of items (say 4) for each row which are separated by _ then one efficient solution can be found using tidyr::separate

library(tidyverse)

rawdata_r %>%
  mutate(File = as.character(File)) %>%
  separate(File, c("Part1", "Part2", "Part3", "Part4"), sep = "_")

The above statement will divide File data in 4 columns with name Part1, Part2, Part3, Part4

MKR
  • 19,739
  • 4
  • 23
  • 33