1

I have a list of values which I would like to use as names for separate tables scraped from separate URLs on a certain website.

> Fac_table
[[1]]
[1] "fulltime_fac_table"

[[2]]
[1] "parttime_fac_table"

[[3]]
[1] "honorary_fac_table"

[[4]]
[1] "retired_fac_table"

I would like to loop through the list to automatically generate 4 tables with the respective names.

The result should look like this:

> fulltime_fac_table
    職稱          
V1  "教授兼系主任"
V2  "教授"        
V3  "教授"        
V4  "教授"        
V5  "特聘教授"    

> parttime_fac_table
    職稱       姓名    
V1  "教授"     "XXX"
V2  "教授"     "XXX"
V3  "教授"     "XXX"
V4  "教授"     "XXX"
V5  "教授"     "XXX"
V6  "教授"     "XXX"

I have another list, named 'headers', containing column headings of the respective tables online.

> headers
[[1]]
[1] "職稱"             "姓名"             "    研究領域"
[4] "聯絡方式"        

[[2]]
[1] "職稱"     "姓名"     "研究領域" "聯絡方式"

I was able to assign values to the respective tables with this code:

> assign(eval(parse(text="Fac_table[[i]]")), as_tibble(matrix(fac_data,
> nrow = length(headers[[i]])))

This results in a populated table, without column headings, like this one:

> honorary_fac_table
    [,1]       [,2]    
V1  "名譽教授" "XXX"
V2  "名譽教授" "XXX"
V3  "名譽教授" "XXX"
V4  "名譽教授" "XXX"

But was unable to assign column names to each table.

Neither of the code below worked:

> assign(colnames(eval(parse(text="Fac_table[1]"))), c(gsub("\\s", "", headers[[1]])))
Error in assign(colnames(eval(parse(text = "Fac_table[1]"))), c(gsub("\\s",  : 
  第一個引數不正確

> colnames(eval(parse(text="Fac_table[i]"))) <- c(gsub("\\s", "", headers[[i]]))
Error in colnames(eval(parse(text = "Fac_table[i]"))) <- c(gsub("\\s",  : 
  賦值目標擴充到非語言的物件

> do.call("<-", colnames(eval(parse(text="Fac_table[i]"))), c(gsub("\\s", "", headers[[i]])))
Error in do.call("<-", colnames(eval(parse(text = "Fac_table[i]"))), c(gsub("\\s",  : 
  second argument must be a list

To simplify the issue, a reproducible example is as follows:

> varNamelist <- list(c("tbl1","tbl2","tbl3","tbl4"))
> colHeaderlist <- list(c("col1","col2","col3","col4"))
> tableData <- matrix([1:12], ncol=4)

This works:

> assign(eval(parse(text="varNamelist[[1]][1]")), matrix(tableData, ncol
> = length(colHeaderlist[[1]])))

But this doesn't:

> colnames(as.name(varNamelist[[1]][1])) <- colHeaderlist[[1]]
Error in `colnames<-`(`*tmp*`, value = c("col1", "col2", "col3", "col4" : 
  attempt to set 'colnames' on an object with less than two dimensions

It seems like the colnames() function in R is unable to treat the strings as represented by "Fac_table[i]" as variable names, in which independent data (separate from Fac_table) can be stored.

> colnames(as.name(Fac_table[[1]])) <- headers[[1]]
Error in `colnames<-`(`*tmp*`, value = c("a", "b", "c",  : 
  attempt to set 'colnames' on an object with less than two dimensions

Substituting for 'fulltime_fac_table' directly works fine.

> colnames(fulltime_fac_table) <- headers[[1]]

Is there any way around this issue?

Thanks!

Sati
  • 716
  • 6
  • 27
  • I'd like to help, but please read SO standards on asking questions at this [link here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It helps to simplify to the essentials. – www Sep 06 '17 at 02:38
  • A reproducible example has been duly added. Thanks. – Sati Sep 06 '17 at 03:09
  • @RyanRunge, I just need a suitable placeholder to store the given tables online, looping through a list of URLs while keeping the distinction between them. It might not even be a good idea to keep a list of variable names for the respective tables. So, if you can think of any better way to do this, please share it with me. – Sati Sep 07 '17 at 01:08
  • The tables online doesn't seem to have separate nodes for separate fields. – Sati Sep 07 '17 at 01:19
  • I appreciate you taking the time to clarify your question. It is a lot clearer with the smaller scale example. I guess I'm still wondering what format the data in your larger example is in, and why you'd like to use lists and the assign function for this process. That's not typically done--not that it's necessarily incorrect--but there may be an easier, more standard way to do what you're trying to do. I've edited my answer to try to explain a few tools that may be helpful in the process. – www Sep 07 '17 at 04:26

2 Answers2

0

There is a solution to this, but I think the current set up may be more complex than necessary if I understand correctly. So I'll try to make this task easier.

If you're working with one-dimensional data, I'd recommend using vectors, as they're more appropriate than lists for that purpose. So for this project, I'd begin by storing the names of tables and headers, like this:

varNamelist <- c("tbl1","tbl2","tbl3","tbl4")
colHeaderlist <- c("col1","col2","col3","col4")

It's still difficult to determine what the data format and origin for the input of these table is from your question, but in general, sometimes a data frame can be easier to work with than a matrix, as long as your not working with Big Data. The assign function is also typically not necessary for these sort of steps. Instead, when setting up a dataframe, we can apply the name of the data frame, the name of the columns, and the data contents all at once, like this:

tbl1 <- data.frame("col1"=c(1,2,3),
                   "col2"=c(4,5,6),
                   "col3"=c(7,8,9),
                   "col4"=c(10,11,12))

Again, we're using vectors, noted by the c() instead of list(), to fill each column since each column is it's own single dimension.

To check the output of tbl1, we can then use print():

print(tbl1)

  col1 col2 col3 col4
1    1    4    7   10
2    2    5    8   11
3    3    6    9   12

If it's an option to create the tables closer to this way shown, that might make things easier than using so many lists and assign functions; that quickly becomes overly complicated.

But if you want at the end to store all the tables in a single place, you could put them in a list:

tableList <– list(tbl1=tbl1,tbl2=tbl2,tbl3=tbl3,tbl4=tbl4)

str(tableList)
List of 4
 $ tbl1:'data.frame':   3 obs. of  4 variables:
  ..$ col1: num [1:3] 1 2 3
  ..$ col2: num [1:3] 4 5 6
  ..$ col3: num [1:3] 7 8 9
  ..$ col4: num [1:3] 10 11 12
 $ tbl2:'data.frame':   3 obs. of  4 variables:
  ..$ col1: num [1:3] 1 2 3
  ..$ col2: num [1:3] 4 5 6
  ..$ col3: num [1:3] 7 8 9
  ..$ col4: num [1:3] 10 11 12
 $ tbl3:'data.frame':   3 obs. of  4 variables:
  ..$ col1: num [1:3] 1 2 3
  ..$ col2: num [1:3] 4 5 6
  ..$ col3: num [1:3] 7 8 9
  ..$ col4: num [1:3] 10 11 12
 $ tbl4:'data.frame':   3 obs. of  4 variables:
  ..$ col1: num [1:3] 1 2 3
  ..$ col2: num [1:3] 4 5 6
  ..$ col3: num [1:3] 7 8 9
  ..$ col4: num [1:3] 10 11 12
www
  • 4,124
  • 1
  • 11
  • 22
  • I have some follow-up questions based on the solution I found. But I don't think I can find a reproducible example for it. Wonder if you could help with that as well? It's an entirely different issue altogether. – Sati Sep 07 '17 at 07:53
0

I've found a work around solution based on @Ryan's recommendation, given by this code:

for (i in seq_along(url)){

  webpage <- read_html(url[i]) #loop through URL list to access html data

  fac_data <- html_nodes(webpage,'.tableunder')  %>% html_text()
  fac_data1 <- html_nodes(webpage,'.tableunder1')  %>% html_text()
  fac_data <- c(fac_data, fac_data1) #Store table data on each URL in a variable 

  x <- fac_data %>% matrix(ncol = length(headers[[i]]), byrow=TRUE) #make matrix to extract column data

  for (j in seq_along(headers[[i]])){
    y <- cbind(x[,j]) #extract column data and store in temporary variable
    colnames(y) <- as.character(headers[[i]][j]) #add column name
    print(cbind(y)) #loop through headers list to print column data in sequence. ** cbind(y) will be overwritten when I try to store the result on a list with 'z <- cbind(y)'.
  }
}

I am now able to print out all values, complete with headers of the data in question.

Follow-up questions have been posted here.


The final code solved this problem as well.

Sati
  • 716
  • 6
  • 27