I have a dataframe with several columns from a .tsv file and want to transform one of them into the 'numeric' type for analysis. However, I keep getting the 'NAs' introduced by coercion warning all the time and do not know exactly why. There is some unnecessary info at the beginning of another column, which is pretty much the only formatting I did.
Originally, I thought the file might have added some extra tabs or spaces, which is why I tried to delete these via giving sub() as an argument.
I should also mention that I get the NA errors also when I do not replace the values and run the dataframe as is:
library(tidyverse)
data_2018 <- read_tsv('teina230.tsv')
data_1995 <- read_csv('OECD_1995.csv')
#get rid of long colname & select only columns containing %GDP
clean_data_2018 <- data_2018 %>%
select('na_item,sector,unit,geo','2018Q1','2018Q2','2018Q3','2018Q4') %>%
rename(country = 'na_item,sector,unit,geo')
clean_data_2018 <- clean_data_2018[grep("PC_GDP", clean_data_2018$'country'), ]
#remove unnecessary info
clean_data_2018 <- clean_data_2018 %>%
mutate(country=gsub('\\GD,S13,PC_GDP,','',country))
clean_data_2018 <- clean_data_2018 %>%
mutate(
'2018Q1'=as.numeric(sub("", "", '2018Q1', fixed = TRUE)),
'2018Q2'=as.numeric(sub(" ", "", '2018Q2', fixed = TRUE)),
'2018Q3'=as.numeric(sub(" ", "", '2018Q3', fixed = TRUE)),
'2018Q4'=as.numeric(sub(" ", "", '2018Q4', fixed = TRUE))
)
Is there another way to get around the problem and convert the column without replacing all the values with 'NA'?
Thanks guys :)