0

I want to select some portion from file1.txt and save that in another file named file2.txt. File1.txt:

abc 1 6 a
abc 2 7 b
efg 3 8 c
abc 4 9 d
efg 5 10 e

Database query that I want to apply here is (not syntax specific):

file2.txt <- select col2, col3, col4 from file1.txt where col1=abc

File2.txt:

1 6 a
2 7 b
4 9 d

Is there any way to apply database type query in R on a text file? I know that we can search a letter in .txt file by using grep() function. But I'm unable to find any online help for this? Does anyone know how can I resolve my problem? Thank you in advance :) Please don't mark this question duplicate to:

Searching for string within a text file in R

This question is something different. Moreover I cannot use sqldf because this package is not for .txt files.

Knowledge Seeker
  • 526
  • 7
  • 25

3 Answers3

1

does below help answer to subset data from csvfile?

library(sqldf);
read.csv.sql(file, sql = "select * from file", header = TRUE, sep = ",")

Description Read a file into R filtering it with an sql statement. Only the filtered portion is processed by R so that files larger than R can otherwise handle can be accommodated.

MSW Data
  • 441
  • 3
  • 8
1

That should be all that's needed, mind that like() from the data.table package uses grepl internally, so I think regex might be an option as well.

library(data.table)
# Depending on the characteristics of the csv file this call has to be adjusted
dt <- data.table(read.csv("File1.txt", header = FALSE, sep = " "))
# or
dt <- fread("test.txt")
# data.table looks like this after import
dt <- structure(list(V1 = structure(c(1L, 1L, 2L, 1L, 2L)
      , .Label = c("abc", "efg")
      , class = "factor")
      , V2 = 1:5
      , V3 = 6:10
      , V4 = structure(1:5, .Label = c("a", "b", "c", "d", "e")
      , class = "factor")), row.names = c(NA, -5L)
      , class = c("data.table", "data.frame"))

write.csv(dt[like(V1, "abc"), .(V2
                      , V3
                      , V4
                      )],file = "File2.txt", row.names = FALSE)
hannes101
  • 2,410
  • 1
  • 17
  • 40
1

Assuming the file reproducibly created in the Note at the end:

library(sqldf)

read.csv.sql("File1.txt", 
  "select V2, V3, V4 from file where V1 = 'abc'", header = FALSE, sep = " ")

giving:

  V2 V3 V4
1  1  6  a
2  2  7  b
3  4  9  d

Note

Lines <- "abc 1 6 a
abc 2 7 b
efg 3 8 c
abc 4 9 d
efg 5 10 e
"
cat(Lines, file = "File1.txt")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341