-1

I am running the following code to summarize by INV_ITEM_ID column.

temptable <- temptable[, lapply(.SD, sum), by = INV_ITEM_ID,
                         .SDcols = c("Ext Sale", "Ext Total Cost", "CE100", 
                          "CE110","CE120","QTY_SOLD","PACKSLIP_WHSL")]

The problem is that INV_ITEM_ID IS character type. I am needing to convert it to numeric type so that it will properly summarize the data.

How can I go about doing this. Currently it summarizes but does not leave distinct values.

Frank
  • 66,179
  • 8
  • 96
  • 180
J fast
  • 53
  • 8
  • 1
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jun 27 '18 at 19:29
  • Possible duplicate of [Can't convert character to numeric in R](https://stackoverflow.com/questions/49294602/cant-convert-character-to-numeric-in-r) –  Jun 27 '18 at 19:33
  • The `by=` column should show distinct values when given a column of class character... without a concrete example, I guess it is very hard to figure out from our end. – Frank Jun 27 '18 at 20:51
  • Yes, the they by is character but when I run the script it spits out 5,135,153 rows with 933,049 distinct values. I was thinking if I switched it to numeric it would solve the issue. I don't think that is the solution now though. – J fast Jun 28 '18 at 13:43

2 Answers2

0

I think this is a spin on the question answered here: Converting a factor to numeric without losing information R (as.numeric() doesn't seem to work)

Can you wrap INV_ITEM_ID with these calls: as.numeric(as.character(INV_ITEM_ID))?

Melissa Key
  • 4,476
  • 12
  • 21
  • When wraping it by as.numeric(as.character(INV_ITEM_ID)) it does not leave distinct values. – J fast Jun 27 '18 at 19:52
0

I just ran into this problem myself with Reshape2. First, make sure the variable really is a character:

is.character(your_data$INV_ITEM_ID)

This should return TRUE

If so, then:

your_data$INV_ITEM_ID <- as.numeric(your_data$INV_ITEM_ID)

I had issues with sapply() showing the value as one data type, but it tested as another when I tested with is.numeric()

caszboy
  • 48
  • 1
  • 8
  • Using that does allow me to run the script, however it adds the column but is blank. – J fast Jun 28 '18 at 14:13
  • When you ran `is.character()`, did it return `TRUE`? – caszboy Jun 28 '18 at 16:41
  • Yes, it did return as true – J fast Jun 28 '18 at 16:49
  • To make it more simple I ran the data with just one other column. I used the following code. temptable <- temptable[, lapply(.SD, sum), by = INV_ITEM_ID, .SDcols = c("Ext Sale")] I kept INV_ITEM_ID as character type and it still returned 5 million rows and not all distinct – J fast Jun 28 '18 at 16:50
  • Sounds like a string issue. Have you tried to `trim()`? Maybe their are white spaces. Also try: `sum(!duplicated(df$column_name))` to count the number of distinct in the column. – caszboy Jun 29 '18 at 12:48
  • I come up with the following > sum(!duplicated(mastertable$INV_ITEM_ID)) [1] 94397 which is way less than 5 millions rows. What does the trim function do exactly? How would I go about adding that in my script? – J fast Jun 29 '18 at 13:11
  • Also is there a way of finding out if any have leading white spaces? – J fast Jun 29 '18 at 13:13
  • I have tried using the trim function and no luck in summarizing the data with leaving only distinct values – J fast Jun 29 '18 at 13:33