2

When I extract the matadata from an IR site, I found that the value of dataframe could not be rewrite. In the matadata I extract, there is an value of attribute named “Related URLs” is “查看原文”(means “look up the source”), which need to be replaced by its real link in the webpage.

> dput(imeta_dc)
structure(list(itemDisplayTable = structure(c(5L, 8L, 6L, 4L, 
3L, 7L, 1L, 1L, 12L, 9L, 13L, 11L, 2L, 10L), .Names = c("Title", 
"Author", "Source", "Issued Date", "Volume", "Corresponding Author", 
"Abstract", "English Abstract", "Indexed Type", "Related URLs", 
"Language", "Content Type", "URI", "专题"), .Label = c(" In the current data-intensive era, the traditional hands-on method of conducting scientific research by exploring related publications to generate a testable hypothesis is well on its way of becoming obsolete within just a year or two. Analyzing the literature and data to automatically generate a hypothesis might become the de facto approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets. Here, viewpoints are provided and discussed to help the understanding of challenges of data-driven discovery.", 
"[http://ir.las.ac.cn/handle/12502/8904] ", "1, Issue:4, Pages:1-9", 
"2016-11-03 ", "Data-driven Discovery: A New Era of Exploiting the Literature and Data", 
"Journal of Data and Information Science ", "Ying Ding (E-mail:dingying@indiana.edu) ", 
"Ying Ding; Kyle Stirling ", "查看原文 ", "期刊论文", "期刊论文 ", 
"其他 ", "英语 "), class = "factor")), .Names = "itemDisplayTable", row.names = c("Title", 
"Author", "Source", "Issued Date", "Volume", "Corresponding Author", 
"Abstract", "English Abstract", "Indexed Type", "Related URLs", 
"Language", "Content Type", "URI", "专题"), class = "data.frame")

I tried to use name of row and column to locate the value of “Related URLs” and change its value by such sentence:

meta_ru <- “http://www.jdis.org”
imeta_dc[c("Related URLs"), c("itemDisplayTable")] <- meta_ru

I use rownames instead of rownumbers because those metadata has different length and different sequence of attribute, only this way can locate one attribute accurately. Further more, when I do this, none of error or warning occurs, but the data could not write into it, and it changed to blank. What should we do to avoid this problem?

赵鸿丰
  • 185
  • 9

1 Answers1

0

There is one problem with your dataset, the field itemDisplayTable is in factor , you need to first convert it into character then use rownames() function to assign it to a value like below.

df$itemDisplayTable <- as.character(df$itemDisplayTable)
meta_ru <- c("http://www.jdis.org")

df[(rownames(df) %in% c("Related URLs"))==T,"itemDisplayTable"] <- meta_ru
View(df)

Output:

You can see here that Related URLs is not empty now and filled with "http://www.jdis.org" in the final output.

enter image description here

PKumar
  • 10,971
  • 6
  • 37
  • 52
  • Thank you for your warmly reply. In the latest test before I post this question. I've tried to use "as.factor" to change "meta_ru" into "factor" class, but it still could not write into the dataframe either. Is there any tricks when we use "factor" class? – 赵鸿丰 May 23 '17 at 09:03
  • @赵鸿丰 No you can't manupulate factor atleast in my knowledge to use character functions, you have to convert it to character, but are you getting any error? while you try to convert to character from a factor. Ideally when you read your data.frame you should always use `options(stringsAsFactors=F)` at the top of your code. – PKumar May 23 '17 at 09:08
  • @赵鸿丰 You may use , `library(dplyr) ; df1 <- df %>% mutate_if(is.factor,as.character)` in case you want to convert a lot of factor columns to character at once. `df1` is your final dataset without factors here in this case and `df` is your dataframe with factors – PKumar May 23 '17 at 09:21
  • Thank you for your insight. Seems "factor" is a special class of data, maybe it process as a "entirety", and could not change by "part". When I tried to change factor value by assignment, there is no warning or error message occurs, but the value turns into blank. – 赵鸿丰 May 24 '17 at 02:18
  • @赵鸿丰 , It shouldn't happen ideally when you try to work character functions/operations on factors a warning is generated like in your case if i don't convert the variable `itemDisplayTable` to character, something like this `Warning message: In `[<-.factor`(`*tmp*`, iseq, value = "http://www.jdis.org") : invalid factor level, NA generated`, not sure why your system is not doing it. Also, you must read this, https://stackoverflow.com/questions/3445316/factors-in-r-more-than-an-annoyance – PKumar May 24 '17 at 04:01
  • Thanks a lot, but I have one last detailed question on dataframe, about its column name. When I rebuild my dataframe, I use this sentence:"imeta_dc<- as.data.frame(meta_value, row.names = meta_label, col.names='itemDisplayTable', stringsAsFactors=FALSE) ", rownames is a list and it built well, but col.name could not be assign. No matter what I set, it always goes as the parameter which stored the list to be the column value. Generally speaking, how do we set a custom column name for dataframe? Only build an empty dataframe? – 赵鸿丰 May 24 '17 at 10:31
  • @赵鸿丰 To assign it column names use this: `imeta_dc$meta_label <- rownames(imeta_dc)`; `rownames(imeta_dc) <- NULL` , Now in your imeta_dc , you have a column called meta_label as a column and your rownames must have vanished – PKumar May 25 '17 at 07:03