7

I have data like dataframe df_a, and want to have it converted to the format as in dataframe df_b.

xtabs() gives similar result, but I did not find a way to access elements as in the example code below. Accessing through xa[1,1] gives no advantage since there is a weak correlation between indexing by numbers ("1") and names ("A"). As you can see there is a sort difference in the xtabs() result, so xa[2,2]=2 and not 0 as on the df_b listing.

    > df_a
      ItemName Feature Amount
    1    First       A      2
    2    First       B      3
    3    First       A      4
    4   Second       C      3
    5   Second       C      2
    6    Third       D      1
    7   Fourth       B      2
    8   Fourth       D      3
    9   Fourth       D      2
    > df_b
      ItemName A B C D
    1    First 6 3 0 0
    2   Second 0 0 5 0
    3    Third 0 0 0 1
    4   Fourth 0 2 0 5
    > df_b$A
    [1] 6 0 0 0

    > xa<-xtabs(df_a$Amount~df_a$ItemName+df_a$Feature)
    > xa
                 df_a$Feature
    df_a$ItemName A B C D
           First  6 3 0 0
           Fourth 0 2 0 5
           Second 0 0 5 0
           Third  0 0 0 1
    > xa$A
    Error in xa$A : $ operator is invalid for atomic vectors

There is a way of iterative conversion with for() loops, but totally inefficient in my case because my data has millions of records.

For the purpose of further processing my required output format is dataframe. If anyone solved similar problem please share.

cineS.
  • 93
  • 1
  • 7

2 Answers2

11

You can just use as.data.frame.matrix(xa)

# output
       A B C D
First  6 3 0 0
Fourth 0 2 0 5
Second 0 0 5 0
Third  0 0 0 1

## or
df_b <- as.data.frame.matrix(xa)[unique(df_a$ItemName), ]
data.frame(ItemName = row.names(df_b), df_b, row.names = NULL)
# output
  ItemName A B C D
1    First 6 3 0 0
2   Second 0 0 5 0
3    Third 0 0 0 1
4   Fourth 0 2 0 5
nghauran
  • 6,648
  • 2
  • 20
  • 29
3

Without using xtabs you can do something like this:

df %>%
 dplyr::group_by(ItemName, Feature) %>%
 dplyr::summarise(Sum=sum(Amount, na.rm = T)) %>%
 tidyr::spread(Feature, Sum, fill=0) %>%
 as.data.frame()

This will transform as you require and it stays as a data.frame

Or, you can just as.data.frame(your_xtabs_result) and that should work too

morgan121
  • 2,213
  • 1
  • 15
  • 33