1

I have a large data set that I have imported from Excel to R. I want to get all the entries that have a negative value for a specific variable, MG. I use the code:

A <- subset(df, MG < 0)

However, A becomes empty, despite the fact that there are several entries with a value below 0. This is not the case when I am looking for values larger than 0, < 0. It should be added that there are N/A values in the data, but adding na.rm = TRUE does not help.

I also notice that R treats MG as a binary true/false variable since it sometimes contains 1 and 0.

Any idea what I have done wrong?

edit:

Country Region      Code Product name Year Value MG
Sweden  Stockholm   123  Apple        1991 244   NA
Sweden  Kirruna     123  Apple        1987 100   NA
Japan   Kyoto       543  Pie          1987 544   NA
Denmark Copenhagen  123  Apple        1998 787   0
Denmark Copenhagen  123  Apple        1987 100   1
Denmark Copenhagen  543  Pie          1991 320   0
Denmark Copenhagen  126  Candy        1999 200   1
Sweden  Gothenburg  126  Candy        2013 300   0
Sweden  Gothenburg  157  Tomato       1987 150   -55
Sweden  Stockholm   125  Juice        1987 250   150
Sweden  Kirruna     187  Banana       1998 310   250
Japan   Kyoto       198  Ham          1987 157   1000
Japan   Kyoto       125  Juice        1987 550   -1
Japan   Tokyo       125  Juice        1991 100   0
KGB91
  • 630
  • 2
  • 6
  • 24
  • 2
    What does `str(df)` produce? – Dason Sep 24 '18 at 14:10
  • It just reproduces the data. If I try to assign it to `B`, it becomes a null. – KGB91 Sep 24 '18 at 14:13
  • 3
    I wouldn't expect assigning the output of str to be useful - but showing us the output from `str(df)` (and maybe `summary(df)` too) could help us. – Dason Sep 24 '18 at 14:15
  • From the latter one, `summary`, it is clear that R thinks it is a `Mode :logical`, which is not the case in reality. How do I make R understand it is not a logical value? – KGB91 Sep 24 '18 at 14:17
  • 1
    If it's logical then it can't be taking on negative values. Are you sure you actually have negative values in that variable? – Dason Sep 24 '18 at 14:18
  • 2
    Right now a lot of what you're saying is contradictory. Without providing some sort of minimal reproducible example it will be hard to help. Read this for examples of how to create a good example: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Dason Sep 24 '18 at 14:19
  • Yes I have. The code works fine for the variables where R does not think it is logical. No variable in the data is logical, even it R seems to think there are some cases where the variables are logical. – KGB91 Sep 24 '18 at 14:20
  • Thanks for the link! The only think I want to do is to make R understand that `MG` is not logical, but numeric. – KGB91 Sep 24 '18 at 14:21
  • 2
    Can you post sample data? Please edit **the question** with the output of `dput(df)`. Or, if it is too big with the output of `dput(head(df, 20))`. – Rui Barradas Sep 24 '18 at 14:21
  • It looks something like the edit. I want to get R to understand that `MG` is numerical data. – KGB91 Sep 24 '18 at 14:24
  • 1
    You can set the mode easily with `mode(df$MG) <- "numeric"`, but that won't solve how it got that way in the first place, which is unclear so far. – Anonymous coward Sep 24 '18 at 14:25
  • I cannot reproduce the error with your data, please post the code that creates the df so that we can check for syntax – Chabo Sep 24 '18 at 14:27
  • `library(readxl) df <- read_excel("Test/df.xlsx") View(df)` – KGB91 Sep 24 '18 at 14:28
  • Seems like `mode(df$MG) <- "numeric"` helped in the real data. Thanks a lot :) – KGB91 Sep 24 '18 at 14:33

1 Answers1

1

From your comments it looks like you're using read_excel to read in the data. It only reads a few rows to try to figure out what type the data probably is. You can bypass the part where it "guesses" so that when it reads in it knows that MG is numeric.

df <- read_excel("Test/df.xlsx", 
                  col_types = c("text", "text", "numeric", "text", "numeric", "numeric", "numeric"))
Dason
  • 60,663
  • 9
  • 131
  • 148