I am new to R
and currently having a plenty of trouble just reading in .csv
file and converting it into data.frame
with 7
columns. Here is what I am doing:
gene_symbols_table <- as.data.frame(read.csv(file="/home/nikita/Desktop
/CElegans_raw_data/gene_symbols_matching.csv", header=TRUE, sep=","))
After that I am getting a data.frame
with dim = 46761 x 1
, but I need it to be 46761 x 7
. I tried the following stackoverflow
threads:
But somehow nothing is working in my case. Here is how the table looks:
> head(gene_symbols_table, 3)
input.reason.matches.organism.name.primaryIdentifier.symbol.briefDescription.c
lass.secondaryIdentifier
1 WBGene00008675 MATCH 1 Caenorhabditis elegans
WBGene00008675 irld-26 Gene F11A5.7
2 WBGene00008676 MATCH 1 Caenorhabditis elegans
WBGene00008676 oac-15 Gene F11A5.8
3 WBGene00008677 MATCH 1 Caenorhabditis elegans
WBGene00008677 Gene F11A5.9
The .csv
file in Excel
looks like this:
input | reason | matches | organism.name | primaryIdentifier | symbol |
briefDescription
WBGene00008675 | MATCH | 1 | Caenorhabditis elegans WBGene00008675 | irld-26 | ...
...
The following code:
gene_symbols_table <- read.table(file="/home/nikita/Desktop
/CElegans_raw_data/gene_symbols_matching.csv", header=FALSE, sep=",",
col.names = paste0("V",seq_len(7)), fill = TRUE)
Seems to be working, however when I look into dim
I can see right away that it is wrong: 20124 x 7
. Then:
V1
1input;reason;matches;organism.name;primaryIdentifier;symbol;briefDescription;class;secondaryIdentifier
2 WBGene00008675;MATCH;1;Caenorhabditis
elegans;WBGene00008675;irld-26;;Gene;F11A5.7
3 WBGene00008676;MATCH;1;Caenorhabditis
elegans;WBGene00008676;oac-15;;Gene;F11A5.8
V2 V3 V4 V5
1
2
3
1
So, it is wrong
Other attempts at read.table
are giving me the error specified in the second stackoverflow
thread.
I have also tried splitting the data.frame
with one column into 7, but so far no success.