0

I haven't really used R before but I need to separate data from a CSV with 440 entries into 2 columns in a table. The strings are different lengths. I would like to split the string into 2 parts.

An example is ACTL6A_S5. I would like everything before the _ in one column and everything after the _ in another column and then export this as a CSV again. Is the best way to manage this in a for loop or where would I start?

Currently I've managed to export the CSV and the column I want into RStudio and have it displayed

biological_dataset <-read.csv("Exampledata.csv") #Setting the name of the csv file 
#print(biological_dataset) #Printing the data in the csv file

feature_name_example <- as.character(biological_dataset$X[1])
as.character(biological_dataset$X[1:440])

R Output:

Current output from CSV

Expected results something like

  Column1 Column2
1   S1     ACTL6A
2   S2     ADAMTS1
camille
  • 16,432
  • 18
  • 38
  • 60

2 Answers2

1

If I understand it correctly, the following should achieve what you want:

library("tidyr")
fixed <- separate(data = biological_dataset, col = X, into = c("Column1", "Column2"), sep = "_")

write.csv(x = fixed, file = "fixed_dataset.csv")

In brief, take column X from the given dataset, and separate it into two columns with the names provided when there's an underscore.

giocomai
  • 3,043
  • 21
  • 24
0

Here is an option using base R

out <- cbind(biological_dataset, read.table(text = biological_dataset$X, 
       sep="_", header = FALSE, col.names = c("Column1", "Column2")))
akrun
  • 874,273
  • 37
  • 540
  • 662