0

I have merged multiple CSV files with the following code:

library(data.table)
setwd("C:\\Users\\jessi\\Desktop\\Selenium\\TASK 001\\Extracted CSV\\test_batch")
files <- list.files(pattern = ".csv")
temp <- lapply(files, fread, sep="")
data <- rbindlist(temp)
write.csv(data, file="rmergeoutput2.csv", row.names = FALSE)

My entries are held in a single cell on each line and seperated by "|" as follows:

Year|RecID|ParID|ConParID|Country|Divisi..............
1861|2549365|281|186|ENG|I|London (Parts Of Middlesex.....
1861|9|1|1|ENG|I|London (Parts Of Middlesex, Surrey ..........
1851|2549365|281|186|ENG|I|London (Parts Of Middlesex.....
1851|9|1|1|ENG|I|London (Parts Of Middlesex, Surrey ..........

My dataframe isn't ordered and I'd like to order it in ascending order. In particular, I'd like for the Year, RecID, ParID, ConParID to all be displayed in an orderly manner. How can I do this? I have tried order(data) to no success.

Rye Rye
  • 57
  • 5

2 Answers2

2

You could include this before saving the file. (I shortend the rest of the data, i.e. the dots ".....")

tmp <- tempfile()
writeLines(text = c("Year|RecID|ParID|ConParID|Country|Divisiion|rest",
"1861|2549365|281|186|ENG|I|London (Parts Of Middlesex",
"1861|9|1|1|ENG|I|London (Parts Of Middlesex, Surrey",
"1851|2549365|281|186|ENG|I|London (Parts Of Middlesex",
"1851|9|1|1|ENG|I|London (Parts Of Middlesex, Surrey"),
con = tmp)


data <- fread(file = tmp, sep = "|")

data2 <- data[order(Year, RecID, ParID, ConParID)]

write.table(x = data2,file = tmp, sep = "|",quote = FALSE, row.names = FALSE)
readLines(tmp)


[1] "Year|RecID|ParID|ConParID|Country|Divisiion|rest"     
[2] "1851|9|1|1|ENG|I|London (Parts Of Middlesex, Surrey"  
[3] "1851|2549365|281|186|ENG|I|London (Parts Of Middlesex"
[4] "1861|9|1|1|ENG|I|London (Parts Of Middlesex, Surrey"  
[5] "1861|2549365|281|186|ENG|I|London (Parts Of Middlesex"
Jakob Gepp
  • 463
  • 3
  • 10
1

It looks like your data is pipe-delimited. Use sep = '|' as an argument to your fread function. This will read your data into separate columns instead of a single string per row. Then you can sort your data as usual -- using order, dplyr::arrange, or the various options in data.table, shown in this answer .

Colin H
  • 600
  • 4
  • 9
  • Does this suit your suggestion:? ```library(data.table) library(tidyverse) setwd("C:\\Users\\jessi\\Desktop\\Selenium\\TASK 001\\Extracted CSV\\test_batch") files <- list.files(pattern = ".csv") temp <- lapply(files, fread, sep="|") data <- rbindlist(temp) order(data) write.csv(data, file="rmergeoutput2.csv", row.names = FALSE)``` – Rye Rye Feb 25 '21 at 14:54
  • 1
    That's nearly it; I would add the section that @Jacob Gepp offers below to do your sorting: `data2 <- data[order(Year, RecID, ParID, ConParID)]` – Colin H Feb 25 '21 at 14:59
  • I see, supposedly something like this: ```library(data.table) setwd("C:\\Users\\jessi\\Desktop\\Selenium\\TASK 001\\Extracted CSV\\test_batch") files <- list.files(pattern = ".csv") temp <- lapply(files, fread, sep="|") data <- rbindlist(temp) data2 <- data[order(Year, RecID, ParID, ConParID)] write.csv(data2, file="rmergeoutput2.csv", row.names = FALSE)``` I will run it and update you. Thank you for the help. – Rye Rye Feb 25 '21 at 15:14