-1

I have a dataframe with several columns from a .tsv file and want to transform one of them into the 'numeric' type for analysis. However, I keep getting the 'NAs' introduced by coercion warning all the time and do not know exactly why. There is some unnecessary info at the beginning of another column, which is pretty much the only formatting I did.

Originally, I thought the file might have added some extra tabs or spaces, which is why I tried to delete these via giving sub() as an argument.

I should also mention that I get the NA errors also when I do not replace the values and run the dataframe as is:

library(tidyverse)

data_2018 <- read_tsv('teina230.tsv')
data_1995 <- read_csv('OECD_1995.csv')

#get rid of long colname & select only columns containing %GDP
clean_data_2018 <- data_2018 %>%
  select('na_item,sector,unit,geo','2018Q1','2018Q2','2018Q3','2018Q4') %>%
  rename(country = 'na_item,sector,unit,geo')
clean_data_2018 <- clean_data_2018[grep("PC_GDP", clean_data_2018$'country'), ]

#remove unnecessary info
clean_data_2018 <- clean_data_2018 %>%
  mutate(country=gsub('\\GD,S13,PC_GDP,','',country))
clean_data_2018 <- clean_data_2018 %>%
  mutate(
    '2018Q1'=as.numeric(sub("", "", '2018Q1', fixed = TRUE)),
    '2018Q2'=as.numeric(sub(" ", "", '2018Q2', fixed = TRUE)),
    '2018Q3'=as.numeric(sub(" ", "", '2018Q3', fixed = TRUE)),
    '2018Q4'=as.numeric(sub(" ", "", '2018Q4', fixed = TRUE))
    )

Is there another way to get around the problem and convert the column without replacing all the values with 'NA'?

Thanks guys :)

  • 2
    It's impossible to say without seeing your data, but a likely place is those `as.numeric` statements. Try `as.numeric('a')` and you'll see that you get that error if there's anything that's not a number in that data – divibisan Jun 05 '19 at 21:09
  • 1
    Actually, what are you calling `sub` on in that `mutate` statement? If `2018Q1` is supposed to be a variable in `clean_data_2018`, you need to provide its name as a bare name _without_ quotes. With quotes, you're calling `sub` on the actual string `'2018Q1'` – divibisan Jun 05 '19 at 21:10
  • I tried to run it without the quotes, but strangely enough I ran into problems while executing. It returns an "unexpected token 'Q1'" etc. error for the four variables :$ – schroederadrian Jun 05 '19 at 21:35
  • You need to use bare names to subset with `$`, only square bracket subsetting lets you use a string to select variables. Also, in your first `sub` statement, you're replacing nothing with nothing (I assume it should be a space) – divibisan Jun 05 '19 at 21:42
  • There are a lot of typos here and without a [mcve] of your data, there's no way to reproduce this problem and try to solve it. Take a look at this question to learn how to modify your question to be answerable: [How to make a great R reproducible example](https://stackoverflow.com/q/5963269/8366499) – divibisan Jun 05 '19 at 21:43

1 Answers1

0

Thanks for the hint @divibisan !

Renaming the columns via rename() actually solved the problem. Here the code which finally worked:

library(tidyverse)

data_2018 <- read_tsv('teina230.tsv')

#get rid of long colname & select only columns containing %GDP
clean_data_2018 <- data_2018 %>%
  select('na_item,sector,unit,geo','2018Q1','2018Q2','2018Q3','2018Q4') %>%
  rename(country = 'na_item,sector,unit,geo',
         quarter_1 = '2018Q1',
         quarter_2 = '2018Q2',
         quarter_3 = '2018Q3',
         quarter_4 = '2018Q4')
clean_data_2018 <- clean_data_2018[grep("PC_GDP", clean_data_2018$'country'), ]

#remove unnecessary info
clean_data_2018 <- clean_data_2018 %>%
  mutate(country=gsub('\\GD,S13,PC_GDP,','',country))
clean_data_2018 <- clean_data_2018 %>%
  mutate(
    quarter_1 = as.numeric(quarter_1),
    quarter_2 = as.numeric(quarter_2),
    quarter_3 = as.numeric(quarter_3),
    quarter_4 = as.numeric(quarter_4)
    )