0

I know how to do this in Python, but right now I need to use R. I have a large dataframe with 15m records and 65 variables. I have subset the data to smaller regions of interest and I want to use the subset name as a variable in later steps in my script. For example

x <- subset(largedata, Name=="Test").  

How to I get the name x as a variable, not as a data input?

MrFlick
  • 195,160
  • 17
  • 277
  • 295
JonH
  • 1
  • 1
  • 1
    You can use `ls()` and extract the object names, not clear why you need that. Instead this can be done in a list without creating objects in the global env. Without knowing the exact problem, it is difficult to comment though – akrun Nov 20 '19 at 17:41
  • 1
    Use `assign()` statement – smci Nov 20 '19 at 17:42
  • sorry, I will try to be clearer. I do not want the field names in x, I just want x as a variable to feed into results names. I want the script to be use across several hundred regions and I do not want to have to set the output to the region name each time. I would use basename in python to do the same thing, but anything I try gets me the whole dataset with a new name, I just want the name x as a variable. – JonH Nov 20 '19 at 17:47
  • 2
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. It's unclear to me exactly what you are trying to do. And note that how you do things the R-way might be very different than the Python-way. Line by line translations of code are usually not a good idea. – MrFlick Nov 20 '19 at 18:28
  • As smci said, use `assign`. E.g. `x <- 'my_var'; assign(x, subset(iris, Species == "setosa"))`. – Axeman Nov 20 '19 at 18:55
  • Here is an example, I have a data table called Landscape in which it has 243533 observations and 65 variables. If I do x <- Landscape I get another data table with the same data, just different name; if I x <- do ls(Landscape) I get the field names, useful but too much info. I simply need x = Landscape, no data just the name of the data table as a string variable. – JonH Nov 20 '19 at 19:40
  • You don't mean `x <- "Landscape"` do you? That is a character variable that contains the name of the data frame. You can use it to label your results. – dcarlson Nov 20 '19 at 23:52
  • **To clearly explain what you're trying to do, please show the actual code you want to avoid in *"I do not want to have to set the output to the region name each time"***. And as to " I subset the data to smaller regions of interest and want to use the subset name as a variable in later steps..."*, this sounds like an XY problem, you could e.g. use dplyr's `largedata %>% filter(Name=='Test') %>% ...` – smci Nov 20 '19 at 23:58

1 Answers1

1

Alternative approach, split your data on Name column, then you have a named list, try this example:

mySplitData <- split(iris, iris$Species)

mySplitData
# $setosa
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1           5.1         3.5          1.4         0.2  setosa
# ...
# $versicolor
#     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
# 51           7.0         3.2          4.7         1.4 versicolor
# ...
# etc

# to access by name:
mySplitData$setosa
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1           5.1         3.5          1.4         0.2  setosa
# ...

In your example it would be something like:

mySplitData <- split(largedata, largedata$Name)
zx8754
  • 52,746
  • 12
  • 114
  • 209