0

I am very new to R and I would like some help please. So I have a txt file that the data inside look like this:

14853 C001    1 Apples                                                      Apples
14854 BX0     0 Oranges                                                     Oranges
14855 F00058  0 Apples and Oranges in the, basket                           Apples and Oranges in the, [basket]

All the columns are headerless and I am trying to organize them in a dataframe in columns like this:

'14853' 'C001' '1' 'Apples' 'Apples'
'14854' 'BX0' '0' 'Oranges' 'Oranges'
'14855' 'F00058' '0' 'Apples and Oranges in the, basket' 'Apples and Oranges in the, [basket]'

Is there anyway to do this using R?

I have tried many different things with read.table(), fread(), scan(), etc...

und3rd06012
  • 717
  • 1
  • 14
  • 19
  • 2
    See `?read.fwf` - what you have is a fixed-width file - also see https://stackoverflow.com/questions/14383710/read-fixed-width-text-file – thelatemail Mar 12 '18 at 23:43

1 Answers1

1

In order to parse your input file, you will need to determine the column widths of your file. As noted by @thelatemail, you have a fixed-width format and could use the base function read.fwf to solve.

I offer the solution below:

library(readr)

txt <- paste(
  "14853 C001    1 Apples                                                      Apples",
  "14854 BX0     0 Oranges                                                     Oranges",
  "14855 F00058  0 Apples and Oranges in the, basket                           Apples and Oranges in the, [basket]",
  sep = "\n"
)

df <- read_fwf(txt, fwf_widths(c(6, 7, 2, 60, 36)))

# # A tibble: 3 x 5
#      X1 X2        X3 X4                                X5                                 
#   <int> <chr>  <int> <chr>                             <chr>                              
# 1 14853 C001       1 Apples                            Apples                             
# 2 14854 BX0        0 Oranges                           Oranges                            
# 3 14855 F00058     0 Apples and Oranges in the, basket Apples and Oranges in the, [basket]

N.B. You must account for white space in fixed width as there is no other delimiter. Also note that the column type will be guessed using the same logic as other functions in the family, like read_csv, alternatively, use col_types. The col_names argument will allow you to provide names given they are not available in your input.

Kevin Arseneau
  • 6,186
  • 1
  • 21
  • 40
  • Thank you all for your answers! I have one small problem though. My txt file is 94127 lines and I guess I can't use paste the way you use it. The problem is that I am doing the following: `df <- read_fwf('icd10cm_order_2018.txt', fwf_widths(c(6, 8, 2, 61, 100)), col_types = cols(X1 = col_integer(), X2 = col_character(), X3 = col_integer(), X4 = col_character() , X5 = col_character()) )` and I get a warning:`Warning: 74188 parsing failures.` and in the end of the warning i get this: `In rbind(names(probs), probs_f) : number of columns of result is not a multiple of vector length (arg 1)` – und3rd06012 Mar 18 '18 at 11:23
  • The data are being stored in the dataframe (df) correctly. But I would like to get rid of that warning if it possible. I would like also to add that the widths of each line are different that the ones are I am showing on the sample of my questions, so you can reply with the widths that I am giving on the above comment. – und3rd06012 Mar 18 '18 at 11:25