Trimming unwanted characters

Question

I have a very large data set (CSV) with information about bicycle counts from a bike share system. The information I'm working with is the time at which bicycles were taken out of the racks (departure time) and also the total travel time. What I want to do is to add them so I can get the arrival time at the arrival station. The departure time variable is FECHA_HORA_RETIRO and the travel time variable is TIEMPO_USO. The former, which is read by R as factor object, is in the following format: "23/01/2017 19:55:16". On the other hand, TIEMPO_USO is read by R as a character and it's in the following format: "0:17:46".

> head(viajes_ecobici_2017$FECHA_HORA_RETIRO)
[1] 28/01/2017 13:51 17/01/2017 16:24 12/01/2017 16:38 25/01/2017 10:31

> head(viajes_ecobici_2017$TIEMPO_USO)
[1] "1:35:37" "0:11:17" "0:32:51" "0:31:29" "1:31:59" "0:21:43" "0:5:43"

I first used strptime to get everything in the desired format

 > viajes_ecobici_2017$FECHA_HORA_RETIRO    =format(strptime(viajes_ecobici_2017$FECHA_HORA_RETIRO,format = "%d/%m/%Y %H:%M"),format = "%d/%m/%Y %H:%M:%S")

> viajes_ecobici_2017$TIEMPO_USO = format(strptime(viajes_ecobici_2017$TIEMPO_USO, format="%H:%M:%S"), format="%H:%M:%S")

This works with most observations. However, several observations became NA values after running this code. I went back to the original data to see why this was happening and created a variable with just the observations that became NA. When I looked closer at this observations I saw they have this format "\t\t01/06/2017 00:01". How can I get rid of the "\t\t" while preserving the rest of the information?

Thanks in advance for your help.

When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Give examples of specific dates that file. — MrFlick, Mar 30 '18 at 18:26
Maybe change your title, since you're really asking about trimming unwanted characters? — divibisan, Mar 30 '18 at 19:23

Ben Bolker · Accepted Answer · 2018-03-30T19:23:10.057

1

trimws() trims white space (including tab characters, \t) from the ends of a character variable:

 viajes_ecobici_2017$TIEMPO_USO <- trimws(viajes_ecobici_2017$TIEMPO_USO)

For what it's worth, readr::read_csv() has a built-in trimws option (which is TRUE by default).

edited Mar 30 '18 at 19:23

answered Mar 30 '18 at 19:21

Ben Bolker

211,554
25
370
453

score 1 · Answer 2 · answered Mar 30 '18 at 19:23

1

Assuming that the variable with the problem is TIEMPO_USO, then a simple regex would take care of the tab characters ("\t")

viajes_ecobici_2017$TIEMPO_USO <- gsub("^\\t\\t","", viajes_ecobici_2017$TIEMPO_USO)

answered Mar 30 '18 at 19:23

Nicolás Velasquez

5,623
11
22

Trimming unwanted characters

2 Answers2