R Data wrangling

Question

I am new to R, I have a csv file that contains values:

A, , ,
,B, ,
, ,C1,
, , ,D1
, , ,D2
, ,C2,
, , ,D3
, , ,D4

Loading the data into a data frame:

dat = read.csv("~/RData/test.csv", header = FALSE)
dat
#   V1 V2 V3 V4
# 1  A         
# 2     B      
# 3       C1   
# 4          D1
# 5          D2
# 6       C2   
# 7          D3
# 8          D4

I need to wrangle this to a data frame format:

A,B,C1,D1
A,B,C1,D2
A,B,C2,D3
A,B,C2,D4

Thanks in advance!

Looks like you could do a Last Observation Carried Forward/backward [as here](https://stackoverflow.com/q/2776135/903061) and then remove duplicates, [as here](https://stackoverflow.com/q/13967063/903061). — Gregor Thomas, Oct 10 '17 at 14:31
Possible duplicate of [Push up and tighten Dataframe. General solution](https://stackoverflow.com/questions/46490524/push-up-and-tighten-dataframe-general-solution) — Andre Elrico, Oct 10 '17 at 14:39

score 2 · Answer 1 · edited Oct 10 '17 at 17:28

2

By using zoo

library(zoo)
df[df==' '] <- NA
df[1:3] <- lapply(df[1:3], na.locf0, fromLast = FALSE)
df <- df[!is.na(df$V4),]
df

giving:

  V1 V2 V3 V4
4  A  B C1 D1
5  A  B C1 D2
7  A  B C2 D3
8  A  B C2 D4

or by using magrittr too we can write the above code in terms of this pipeline:

library(magrittr)
library(zoo)

df %>% 
   replace(. == ' ', NA) %>%
   replace(1:3, lapply(.[1:3], na.locf0, fromLast = FALSE)) %>%     
   subset(!is.na(V4))

edited Oct 10 '17 at 17:28

G. Grothendieck

254,981
17
203
341

answered Oct 10 '17 at 15:03

BENY

317,841
20
164
234

@G.Grothendieck feel free to modify it ;-) – BENY Oct 10 '17 at 17:07

score 2 · Accepted Answer · answered Oct 10 '17 at 15:08

A solution using dplyr and tidyr. This solution follows the link in Gregor's comments. But instead of using zoo package, here I show the use of fill function from tidyr, na.omit from base R, and distinct function from dplyr.

library(dplyr)
library(tidyr)

dt2 <- dt %>%
  fill(everything(), .direction = "down") %>%
  na.omit() %>%
  distinct(V4, .keep_all = TRUE)
dt2
  V1 V2 V3 V4
1  A  B C1 D1
2  A  B C1 D2
3  A  B C2 D3
4  A  B C2 D4

DATA

dt <- read.table(text = "V1 V2 V3 V4
1  A NA NA NA         
                 2  NA  B NA NA      
                 3  NA  NA  C1 NA   
                 4  NA  NA  NA D1
                 5  NA  NA  NA D2
                 6  NA  NA  C2 NA   
                 7  NA  NA  NA D3
                 8  NA  NA  NA D4",
                 header = TRUE, stringsAsFactors = FALSE)

Thanks allot, this is elegant I used this as basis. – wingnut Oct 11 '17 at 15:12 — wingnut, Oct 11 '17 at 15:12

R Data wrangling

2 Answers2