-1

I am running a model which produces text files as outputs, 17 variables are reported, with different names (i.e. variable1_X.out, variable2_.out.....variable17_X.out) being X the file number corresponding to the specified parameter used in the simulation, which is variable. Since I was working with few files I was extracting the data from the variables I need, with a basic script and plotting the data, however, by varying some of the model parameters, now I am producing a variable number of files, hence, my script is not useful anymore, and updating it with every run is not practical.

I managed to import all the files I want into R with lapply, however I was wondering if there is a tool/script in order to extract specific columns from each of the files I need and then plotting them altogether. The problem is that the column's names order in the output files is not constant and it changes from variable to variable. However, the names are the same in most of the files, so the selection should be based on specific names' list (chemical species like Al+++, Ca++, Na+....). R is not my field of expertise.

Column's layout from one of the output files..

2 Answers2

3

I'm somewhat confused if the names of the variables are the same or different between the files. That said, you can use dplyr to select specific columns from a dataframe. You can also rename them. Something like this (or some combination) might work.

If the names of the variables are different, import and merge into one big dataframe

library(dplyr)
df1 = rio::import("df1.csv")
df2 = rio::import("df2.csv")

# first cbind the different files into one big dataframe for ease of access
df = cbind(df1, df2) 
# Gives you the columns you select
filtered_df = df %>% dplyr::select(v1,v2,v3)

If the names of the variables are the same. Import and select the variables separately.

df1 = rio::import("df1.csv")
df1_select = df1 %>% select(v1,v2,v3)
    
df2 = rio::import("df2.csv")
df2_select = df2 %>% select(v1,v2,v3)
df2 %>% Rename ("v1_df2" = "v1")
df2 %>% Rename ("v2_df2" = "v1")
df2 %>% Rename ("v3_df2" = "v1")

# cbind the renamed columns
cbind(df1,df2)
thehand0
  • 1,123
  • 4
  • 14
2

You can define the column names that you want to select from each file.

col_names <- c('Al+++', 'Ca++', 'Na+')

Let's say you are using read.csv to read each file, you can select col_names from each of them like :

lapply(files, function(x) {
  data <- read.csv(x)
  data[col_names]
}) -> result

result
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213