0

I'm having trouble merging large monthly data using Excel, so I wanted to learn to use R to consolidate my dataset. I do not know to consolidate my data on a customer level (unique), please see sample data.

df <- read.table(text="
Customer Payment Snapshot 
10001 20000 31-Jan-00
10001 19000 29-Feb-00
10001 18000 31-Mar-00
10001 17000 30-Apr-00
10001 16000 31-May-00
10001 15000 30-Jun-00
10001 14000 31-Jul-00
10001 13000 31-Aug-00
10001 12000 30-Sep-00
10001 11000 31-Oct-00
10001 10000 30-Nov-00
10001 9000 31-Dec-00
10002 50000 31-Jan-00
10002 48500 29-Feb-00
10002 47000 31-Mar-00
10002 45500 30-Apr-00
10002 44000 31-May-00
10002 42500 30-Jun-00
10002 41000 31-Jul-00
10002 39500 31-Aug-00
10002 38000 30-Sep-00
10002 36500 31-Oct-00
10002 35000 30-Nov-00
10002 33500 31-Dec-00", header=T)

In above data, we can see the monthly payments of the customer, however, I want it to be unique and the payments should be per column like this:

payments should be per column like this

joeljpa
  • 317
  • 2
  • 13
Yhan
  • 3
  • 1
  • Hi Yhan! Welcome to StackOverflow – Mark Jul 21 '23 at 03:35
  • 1
    Your question will get answered 10 times faster if you do the following things: 1. include any code you have, 2. include any data you have as the actual data, not screenshots – Mark Jul 21 '23 at 03:36
  • It sounds like you don't know the beginnings of how to use R though. Maybe a more basic tutorial would be useful. – Mark Jul 21 '23 at 03:37
  • I'd ask: what is it that you plan to do with the data next? Very often in R, data in a "long" form is easier to work with than a "wide" form. You can, for example, group on `Customer` to do calculations on `Payment`. – neilfws Jul 21 '23 at 03:49

1 Answers1

1

Learn how to import Excel data into R through reading the excellent documentation for readxl. Then:

library(tidyverse)

df %>%
  group_by(Customer) %>%
  pivot_wider(names_from = "Snapshot", values_from = "Payment")

Here are some resources for learning R:

https://swirlstats.com/students.html

https://www.tidyverse.org/learn/

Mark
  • 7,785
  • 2
  • 14
  • 34
  • Hello, I have tried and here is my code for importing excel into R. library(readxl) library(data.table) library(tidyverse) library(lubridate) final_transactional_data = fread("Outputs/00/raw_transactional_all.csv", select = c(PN_number = "character", Balance = "numeric", snapshot_date_from_filename = "POSIXct")) df %>% group_by(PN_number) %>% pivot_wider(names_from = "snapshot_date_from_filename", values_from = "Balance") But I received an error. – Yhan Jul 21 '23 at 03:52
  • Error in UseMethod("group_by") : no applicable method for 'group_by' applied to an object of class "function" – Yhan Jul 21 '23 at 03:53
  • is PN_number a function? – Mark Jul 21 '23 at 03:54
  • PN_number is customer ID – Yhan Jul 21 '23 at 03:57
  • ah okay, the file is a CSV. If you could add the contents of it to your question, that would be great. Otherwise, this should work: `df <- read_csv("Outputs/00/raw_transactional_all.csv")` – Mark Jul 21 '23 at 04:02
  • I think it's good to start with examples from a course or an online tutorial of some sort, as you get an idea of what best practice is, and how things are done. You can then use those examples as a basis for your own R adventures :-) – Mark Jul 21 '23 at 04:04
  • to ***your question***, not the comments – Mark Jul 21 '23 at 04:28
  • Thank you! It works! However, I want to appear my customer once (unique) like as shown in my question T__T. I – Yhan Jul 21 '23 at 04:33
  • for me it is unique Yhan! Once again, I can't help you with issues with your data if I don't have access to at least a subset of it (the subset that is causing you trouble) – Mark Jul 21 '23 at 04:40
  • 1
    `group_by(Customer)` is redundant. I.e. `df %>% pivot_wider(id_cols = Customer, names_from = Snapshot, values_from = Payment)`. You could even skip `id_cols = Customer` because `id_cols` defaults to all columns in data except for the columns specified through `names_from` and `values_from`. – Darren Tsai Jul 21 '23 at 04:54
  • with respect, I don't think this is the place to be golfing – Mark Jul 21 '23 at 04:58
  • It's not golfing...`group_by(Customer)` is an unnecessary part in your code. Redundant code just leads to computational burden, right? – Darren Tsai Jul 21 '23 at 05:28