0

The .txt file I have is like:

enter image description here

I want to load the table as a dataframe and want to save the variable info for reference. How can I do this?

Here is the link to the data: https://www.sao.ru/lv/lvgdb/article/suites_dw_Table1.txt

sam
  • 37
  • 4
  • 1
    The .txt file in text form would be great (to reproduce)! – TarJae Sep 22 '22 at 19:14
  • 2
    In my opinion you can achieve it with `readLines`, but I agree, .tx file would be great to provide full answer. – Dominik Żabiński Sep 22 '22 at 19:18
  • Like @DominikŻabiński start with `readLines`, then note that the prologue is separated from the data table by a blank line and use this knowledge to keep the table only, then maybe something along the lines of [the answers to this question](https://stackoverflow.com/questions/52023709/what-can-r-do-about-a-messy-data-format). – Rui Barradas Sep 22 '22 at 19:22
  • https://www.sao.ru/lv/lvgdb/article/suites_dw_Table1.txt – sam Sep 22 '22 at 19:25
  • Not resolved. Have given the link to the file. Looking for detailed solution – sam Sep 22 '22 at 19:28

1 Answers1

2

You can read the file with readLines, find the line with the "--------" in it, then extract the line before and all lines after this. Feed the result to read.table

txt <- readLines("https://www.sao.ru/lv/lvgdb/article/suites_dw_Table1.txt")

underline <- grep("--------", txt)

df <- read.table(text = txt[c(underline - 1, (underline + 1):length(txt))], 
                 sep = "|", header = TRUE)

dplyr::tibble(df)
#> # A tibble: 796 x 12
#>    name        a_26   m_b log_lk log_m26 log_mhi   vlg   ti1 md        D delta~1
#>    <chr>      <dbl> <dbl>  <dbl>   <dbl> <chr>   <dbl> <dbl> <chr> <dbl>   <dbl>
#>  1 " HolmIX ~  2.96 -13.6   7.7     8.53 "  8.4~   192   5.1 " ME~  3.61      88
#>  2 " ClumpI ~  0.2   -8.3   5.57   NA    "     ~   -25   4.2 " ME~  3.6     -129
#>  3 " KDG061 ~  1.55 -12.9   8.09   NA    "     ~   360   4   " ME~  3.6      256
#>  4 " [CKT200~  0.88 -10.1   6.29   NA    "     ~   -46   4   " ME~  3.6     -150
#>  5 " ClumpII~  0.11  -8.3   5.57   NA    "     ~    19   3.9 " ME~  3.6      -85
#>  6 " NGC2976~  6.17 -17.1   9.42    9.15 "  8.0~   142   2.9 " ME~  3.56      38
#>  7 " MESSIER~ 13.2  -19.6  10.6     9.86 "  8.9~   328   2.8 " ME~  3.53     224
#>  8 " KDG064 ~  2.19 -12.6   7.98   NA    "     ~   121   2.7 " ME~  3.7       17
#>  9 " [CKT200~  0.87  -9.6   6.8    NA    "     ~    NA   2.5 " ME~  3.66      NA
#> 10 " IKN    ~  3.15 -11.6   7.6    NA    " <6.2~    -1   2.5 " ME~  3.75    -105
#> # ... with 786 more rows, 1 more variable: count <int>, and abbreviated
#> #   variable name 1: delta_vlg

Created on 2022-09-22 with reprex v2.0.2

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • 1
    Upvote but you could `trimws` the character columns – Rui Barradas Sep 22 '22 at 19:48
  • Can you show how to save the column description in another variable? Also, if possible, please describe what you did above in the "read.table()" in a bit more detail. I am new so having hard time to understand. – sam Sep 22 '22 at 20:52
  • The `read.table` is literally reading the table data. To get it to do this, we have to pass it only the lines of the file that have the table, minus the line with the "----" in it, which has the index `underline`. So `underline - 1` is the header row, and `(underline + 1):length(txt)` is from the first line of data to the end of the file. So `txt[c(underline - 1, (underline + 1):length(txt))]` is just the table we want to read. The `sep = "|"` just tells `read.table` that the cells are demarcated by the pipe symbol "|". – Allan Cameron Sep 22 '22 at 21:08
  • As for storing the textual data, it's not clear how you would want to store this other than in the format it is already in. But yes, it's possible I guess. What format do you want it in? – Allan Cameron Sep 22 '22 at 21:10