For loop for converting character data to numeric in a data frame

Question

I have 50 data frames (different name each) with 10 (same name) columns of climate data. The first 5 columns although they are numbers, their class is "character". The rest 4 columns are already in the correct class (numeric) and the last one (named 'wind dir') is in character class so no change is needed.

I tried two ways to convert the class of those 5 columns in all 50 data frames, but nothing worked.

1st way) Firstly I've created a vector with the names of those 50 data frames and I named it onomata.

Secondly I've created a vector col_numbers2 <- c(1:5) with the number of columns I would like to convert.

Then I wrote the following code:

for(i in onomata){
  i[col_numbers2] <- sapply(i[col_numbers2], as.numeric)
}

Checking the class of those first five columns I saw that nothing changed. (No error report after executing the code)

2nd way) Then I tried to use the dplyr package with a for loop and the code is as follows:

for(i in onomata){
 i <- i %>%
  mutate_at(vars(-`wind_dir`),as.numeric)

In this case, I excluded the character column, and I applied the mutate function to the whole data frame, but I received an error message :

Error in UseMethod("tbl_vars") : no applicable method for 'tbl_vars' applied to an object of class "character"

What do you think I am doing wrong ?

Thank you

Original data table (what I get when I use read.table() for each txt file:

date	Time	Tdry	Humidity	Wind_velocity	Wind_direction	Wind_gust
02/01/15	02:00	2.4	77.0	6.4	WNW	20.9
02/01/15	03:00	2.3	77.0	11.3	NW	30.6
02/01/15	04:00	2.3	77.0	9.7	NW	20.9
02/01/15	05:00	2.3	77.0	11.3	NW	30.6
02/01/15	06:00	2.3	78.0	9.7	NW	19.3
02/01/15	07:00	2.2	79.0	12.9	NNW	35.4
02/01/15	08:00	2.4	79.0	8.0	NW	14.5
02/01/15	09:00	2.6	79.0	8.0	WNW	20.9

Data after I split data in columns 1 and 2 (date, time):

day	month	year	Hour	Tdry	Humidity	Wind_velocity	Wind_direction	Wind_gust
02	01	15	02	2.4	77.0	6.4	WNW	20.9
02	01	15	03	2.3	77.0	11.3	NW	30.6
02	01	15	04	2.3	77.0	9.7	NW	20.9
02	01	15	05	2.3	77.0	11.3	NW	30.6
02	01	15	06	2.3	78.0	9.7	NW	19.3
02	01	15	07	2.2	79.0	12.9	NNW	35.4
02	01	15	08	2.4	79.0	8.0	NW	14.5
02	01	15	09	2.6	79.0	8.0	WNW	20.9

Hi, I understand the 50 dataframes are in your environment as separated objects. Is there a way for you to put all these dataframes in a list? — Paul, May 10 '21 at 09:27
Hi Paul! Yes, I have all 50 txt files in my hard drive, but these 5 columns I want to convert are not included in these files, I created them by breaking up 2 columns. — Kon Ath, May 10 '21 at 09:29
I can see a solution if you load these tables as a list of dataframe. Then use `lapply` or `mapply` to change the columns from character to numeric. However, when you load these files in R, are these columns considered as character? If yes, maybe there is a problem with the data itself (like wrong decimal separator, etc.) — Paul, May 10 '21 at 09:34
These 5 character class columns Paul are created by breaking 2 other character columns (date with type "01/05/19" and hour "05:00"), so I don't have the choice to read those files as numeric or something. — Kon Ath, May 10 '21 at 09:38

score 1 · Answer 1 · answered May 10 '21 at 09:40

1

Maybe the following code can help.
First, get the filenames with list.files. Second, read them all in with lapply. If read.table is not the appropriate function, read help("read.table"), it is the same page as for read.csv, read.csv2, etc. Then, coerce the first 5 columns of all data.frames to numeric in one go.

filenames <- list.files(path = "your_directory", pattern = "\\.txt")
onomata <- lapply(filenames, read.table)

onomata <- lapply(onomata, function(X){
  X[1:5] <- lapply(X[1:5], as.numeric)
  X
})

answered May 10 '21 at 09:40

Rui Barradas

70,273
8
34
66

Thank you for the Answer Rui. The problem is that these columns I am trying to convert does not exist in the original files (txt), I 've created created them by breaking 2 other character columns (column 1: date with type "01/05/19" and column 2: hour "05:00"), so the new columns are like this: col1: 01 col2: 05 col3: 19 col4: 05 col5: 00. If I had the choice to read them from the source file, your script seems fine! – Kon Ath May 10 '21 at 09:47
Maybe `lapply(onomata, function(X){X[1:5] <- apply(X[1:5], 2, as.numeric)})` is more suitable to get a list of dataframes – Paul May 10 '21 at 09:48
@H.Johnson you can include the "breaking" part in the `function(X){...}`. If you provide a [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example it will be a lot easier to help you :) – Paul May 10 '21 at 09:50
1

@Paul `X[1:5] <- lapply(X[1:5], as.numeric)` will keep the structure of data.frame `X`, there's no need for `apply`. – Rui Barradas May 10 '21 at 10:02
1

@H.Johnson Can you post the first rows of one of the files in the question? And I do not understand if you *need* 5 columns or if a conversion to class `"Date"` or `"POSIXt"` wouldn't be better. – Rui Barradas May 10 '21 at 10:04
1

Guys thanks for your answers, in the afternoon when I'll be on my PC I will post some rows from the original txt file, and the new data frame I've created after breaking up the two columns. Thank you for your time and asnwers and see you in a few hours! – Kon Ath May 10 '21 at 10:30
@H.Johnson have you figured out how to solve your question? – Paul May 11 '21 at 05:48
@Paul no, I haven't manage to solve my "problem". I have updated my post , providing the two tables I have. First one is the data frame I get by using read.table() and the second one is what I finally have after splitting the first two columns. – Kon Ath May 13 '21 at 11:47
1

Your solution Paul worked! I shouldn't have split the character column of date and time into new character columns, but instead I should have convert them into POSIXt / and hour (with chron function), and then separate them with the method you suggested. Thank you very much! – Kon Ath May 13 '21 at 13:16

Paul · Accepted Answer · 2021-05-10T15:04:58.640

Here are two possible ways. Both relies on getting all your files in a list of dataframes (called df_list in the example below). To acheive this you could use mget() (ex: mget(onomata) or list.files()).

Once this is done, you can use lapply (or mapply) to go through all your dataframes.

Solution 1

To transform your data, I propose you 1st convert it into POSIXct format and then extract the relevant elements to make the wanted columns.

# create a custom function that transforms each dataframe the way you want
fun_split_datehour <- function(df){
  
  df[, "datetime"] <- as.POSIXct(paste(df$date, df$hour), format = "%d/%m/%Y %H:%M") # create a POSIXct column with info on date and time
  
  # Extract elements you need from the date & time column and store them in new columns
  df[,"year"] <- as.numeric(format(df[, "datetime"], format = "%Y"))
  df[,"month"] <- as.numeric(format(df[, "datetime"], format = "%m"))
  df[,"day"] <- as.numeric(format(df[, "datetime"], format = "%d"))
  df[,"hour"] <- as.numeric(format(df[, "datetime"], format = "%H"))
  df[,"min"] <- as.numeric(format(df[, "datetime"], format = "%M"))
  
  return(df)
}

# use this function on each dataframe of your list
lapply(df_list, FUN = fun_split_datehour)

Adapted from Split date data (m/d/y) into 3 separate columns (this answer)

Data:

# two dummy dataframe, date and hour format does not matter, you can tell as.POSIXct what to expect using format argument (see ?as.POSIXct)
df1 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
                 hour = c("05:32", "08:20", "15:33"))
df2 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
                  hour = c("05:32", "08:20", "15:33"))
# you can replace c("df1", "df2") with onomata:  df_list <- mget(onomata)
df_list <- mget(c("df1", "df2"))

Outputs:

> lapply(df_list, FUN = fun_split_datehour)
$df1
        date hour            datetime year month day min
1 2010-01-02    5 2010-01-02 05:32:00 2010     1   2  32
2 2010-02-03    8 2010-02-03 08:20:00 2010     2   3  20
3 2010-09-10   15 2010-09-10 15:33:00 2010     9  10  33

$df2
        date hour            datetime year month day min
1 2010-01-02    5 2010-01-02 05:32:00 2010     1   2  32
2 2010-02-03    8 2010-02-03 08:20:00 2010     2   3  20
3 2010-09-10   15 2010-09-10 15:33:00 2010     9  10  33

And columns year, month, day, hour and min are numeric. You can check using str(lapply(df_list, FUN = fun_split_datehour)).

Note: looking at the question you asked before this one, you might find https://stackoverflow.com/a/24376207/10264278 usefull. In addition, using POSIXct format will save you time if you want to make plots, arrange, etc.

Solution 2

If you do not want to use POSIXct, you could do:

# Dummy data changed to match you situation with already splited date
dfa <- data.frame(day = c("02", "03", "10"),
                  hour = c("05", "08", "15"))
dfb <- data.frame(day = c("02", "03", "10"),
                  hour = c("05", "08", "15"))
df_list <- mget(c("dfa", "dfb"))

# Same thing, use lapply() to go through each dataframe of the list and apply() to use as.numeric on the wanted columns
lapply(df_list, FUN = function(df){as.data.frame(apply(df[1:2], 2, as.numeric))}) # change df[1:2] to select columns you want to convert in your actual dataframes

For loop for converting character data to numeric in a data frame

2 Answers2