0

Maybe this can't even be done because I have my data in a bad format, but I'm trying to get it from a "kind of" wide format to a long format. If I modify the data in excel first, I know how to do this with melt, but I have a LOT of data, and I'm hoping to not have to modify it all by hand first.

Basically I have my headers have depth where I have multiple descriptors for each column of sales. Account, Brand, Variety and Tier (and maybe some others as well depending on how this goes)

            Account
            Brand
            Variety
            Tier
Dates   
1/1/2011    Sales xxx
2/1/2011    Sales xxx

And the format I need it in would be

Date Account Brand Variety Tier Sales

I don't understand how to convert this using reshape or melt though. Any suggestions, or do I just have to do it by hand?

EDIT

Actual data looks something like this (names changed to protect the innocent)

Account   Account A Account A   Account A   Account A   Account B
Brand     Brand A   Brand A         Brand B         Brand B         Brand C
Variety   Cab           Chard           Merlot          Pinot Grigio    Pinot Noir
 Tier     Tier 2    Tier 3          Tier 2          Tier 1          Tier 3
Date                    
6/1/20     35            47              35               28               -   
7/1/2011   12            17              12               12              84               
8/1/2011   -             -               -                -                -   
9/1/2011   7             7           7            7               84 
10/1/2011 28             28              28              28              112 
11/1/2011                                            168 
12/1/2011  7             7           7            7              224 
1/1/2012                 14                              236 

If I massage it in excel to look like this

    Date    Compilation_Red Compilation_White   Oak_Cab Oak_Chard
1   11/1/2012   351 140 183 190
2   12/1/2012   191 99  120 92
3   1/1/2013    96  56  87  51
4   2/1/2013    30  30  84  61
5   3/1/2013    96  70  100 57
6   4/1/2013    60  38  52  22
7   5/1/2013    12  17  65  49
8   6/1/2013    9   7   28  21
9   7/1/2013    16  13  23  22
10  8/1/2013    26  29  40  33
11  9/1/2013    14  4   30  23
12  10/1/2013   17  4   47  31
13  11/1/2013   23  5   63  35
14  12/1/2013   25  3   70  41
15  1/1/2014    44  1   88  41
16  2/1/2014    28  7   57  43

Then I know how to use melt to get it into long format

myItem=melt(Account, id.vars="Date", variable.name="Variety", value.name="Sales")

I just don't understand how to deal with all the extra headers, and get them into the long format.

Also, I don't understand why my table looks fine when I put it in, but is mashed together in the preview.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Krone
  • 43
  • 10
  • 2
    You'll get much more help if you [make your question reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), at the very least post some actual data – alexwhan Apr 22 '14 at 23:43
  • Please post a better sample of the data and exact result you're looking for. Also, please show us the code you've tried so far. – Rich Scriven Apr 22 '14 at 23:43
  • Use dput() to let us try the code ourself. – MichaelVE May 06 '15 at 01:48

2 Answers2

0

I hope this way will help you.

devtools::install_github("yikeshu0611/onetree")

onetree is my own package in github. There is a function: reshape_toLong, which is used to transform wide data to long format.

library(onetree)

Your problem happened at the title, however, from your data posed above, it is difficult to identify the title. If you have a long title like this

wide=read_Text("
AccountBrandVarietyTierDate AccountABrandACabTier2 AccountABrandAChardTier3   AccountABrandBMerlotTier2   AccountABrandBPinotGrigioTier1   AccountBBrandCPinotNoirTier3
6/1/20     35            47              35               28               -   
7/1/2011   12            17              12               12              84               
8/1/2011   -             -               -                -                -   
9/1/2011   7             7               7                7               84 
10/1/2011  28            28              28               28              112 
11/1/2011  -             -               -               168                -
12/1/2011  7             7                7               7              224 
1/1/2012   -             14              -              236               -")


colnames(wide)[1]="date" #change the id var to be different

long = reshape_toLong(data = wide,
                       id = "date",
                        j = "newcolumn",
          value.var.prefix = "Account")

head(long,10)

        date         newcolumn Account
1     6/1/20   ABrandACabTier2      35
2   7/1/2011   ABrandACabTier2      12
3   8/1/2011   ABrandACabTier2       -
4   9/1/2011   ABrandACabTier2       7
5  10/1/2011   ABrandACabTier2      28
6  11/1/2011   ABrandACabTier2       -
7  12/1/2011   ABrandACabTier2       7
8   1/1/2012   ABrandACabTier2       -
9     6/1/20 ABrandAChardTier3      47
10  7/1/2011 ABrandAChardTier3      17

Then we use Left, Right, Mid and Reverse function to deal with newcolumn column.

long$AccountABC = Left(long$newcolumn,1)
long$Brand = Mid(long$newcolumn,2,6)
long$Tier=Right(long$newcolumn,5)
Variety1= Mid(long$newcolumn,8,60)
Variety2=Mid(Reverse(Variety1),6,60)
long$Variety=Reverse(Variety2)
head(long,20)
        date          newcolumn Account AccountABC  Brand  Tier Variety
1     6/1/20    ABrandACabTier2      35          A BrandA Tier2     Cab
2   7/1/2011    ABrandACabTier2      12          A BrandA Tier2     Cab
3   8/1/2011    ABrandACabTier2       -          A BrandA Tier2     Cab
4   9/1/2011    ABrandACabTier2       7          A BrandA Tier2     Cab
5  10/1/2011    ABrandACabTier2      28          A BrandA Tier2     Cab
6  11/1/2011    ABrandACabTier2       -          A BrandA Tier2     Cab
7  12/1/2011    ABrandACabTier2       7          A BrandA Tier2     Cab
8   1/1/2012    ABrandACabTier2       -          A BrandA Tier2     Cab
9     6/1/20  ABrandAChardTier3      47          A BrandA Tier3   Chard
10  7/1/2011  ABrandAChardTier3      17          A BrandA Tier3   Chard
11  8/1/2011  ABrandAChardTier3       -          A BrandA Tier3   Chard
12  9/1/2011  ABrandAChardTier3       7          A BrandA Tier3   Chard
13 10/1/2011  ABrandAChardTier3      28          A BrandA Tier3   Chard
14 11/1/2011  ABrandAChardTier3       -          A BrandA Tier3   Chard
15 12/1/2011  ABrandAChardTier3       7          A BrandA Tier3   Chard
16  1/1/2012  ABrandAChardTier3      14          A BrandA Tier3   Chard
17    6/1/20 ABrandBMerlotTier2      35          A BrandB Tier2  Merlot
18  7/1/2011 ABrandBMerlotTier2      12          A BrandB Tier2  Merlot
19  8/1/2011 ABrandBMerlotTier2       -          A BrandB Tier2  Merlot
20  9/1/2011 ABrandBMerlotTier2       7          A BrandB Tier2  Merlot
zhang jing
  • 141
  • 9
0

If your data is like this, it is more easy.

wide=read.table(header = FALSE, text = ("
Account   AccountA AccountA   AccountA   AccountA       AccountB
Brand     BrandA   BrandA     BrandB     BrandB         BrandC
Variety   Cab      Chard      Merlot     PinotGrigio    PinotNoir
Tier      Tier2   Tier3     Tier2        Tier1           Tier3
6/1/20     35      47          35        28               -   
7/1/2011   12      17          12        12              84               
8/1/2011   -       -           -         -                -   
9/1/2011   7       7           7         7               84 
10/1/2011  28      28          28        28              112 
11/1/2011  -       -           -         168              -
12/1/2011  7       7           7         7               224 
1/1/2012   -       14          -         236              -
          "))

we can change row1 to row4 as one string and treated as column names.

for (i in 1:ncol(wide)) {
  colnames(wide)[i]=inner_Add_Symbol(wide[1:4,i],"_")
}

Then we delet from row1 to row4, and change column names 1 as date, which will be treated as id.

wide2=wide[-(1:4),]
colnames(wide2)[1]="date"

key step reshape to long data

long=reshape_toLong(data = wide2,
                      id="date",
                      j="newcolumn",
    value.var.prefix = "Account")

split the column in this dataframe

head(split_byonecolumn(long,"newcolumn","_",c("AccountABC","Brand","Variety","Tier")))

       date          newcolumn Account AccountABC  Brand Variety  Tier
1    6/1/20 A_BrandA_Cab_Tier2      35          A BrandA     Cab Tier2
2  7/1/2011 A_BrandA_Cab_Tier2      12          A BrandA     Cab Tier2
3  8/1/2011 A_BrandA_Cab_Tier2       -          A BrandA     Cab Tier2
4  9/1/2011 A_BrandA_Cab_Tier2       7          A BrandA     Cab Tier2
5 10/1/2011 A_BrandA_Cab_Tier2      28          A BrandA     Cab Tier2
6 11/1/2011 A_BrandA_Cab_Tier2       -          A BrandA     Cab Tier2
zhang jing
  • 141
  • 9