2

I am new to R, I have a csv file that contains values:

A, , ,
,B, ,
, ,C1,
, , ,D1
, , ,D2
, ,C2,
, , ,D3
, , ,D4

Loading the data into a data frame:

dat = read.csv("~/RData/test.csv", header = FALSE)
dat
#   V1 V2 V3 V4
# 1  A         
# 2     B      
# 3       C1   
# 4          D1
# 5          D2
# 6       C2   
# 7          D3
# 8          D4

I need to wrangle this to a data frame format:

A,B,C1,D1
A,B,C1,D2
A,B,C2,D3
A,B,C2,D4

Thanks in advance!

www
  • 38,575
  • 12
  • 48
  • 84
wingnut
  • 73
  • 5
  • Welcome to SO. Can you post the result of `dput(dat)`? – csgroen Oct 10 '17 at 14:29
  • Looks like you could do a Last Observation Carried Forward/backward [as here](https://stackoverflow.com/q/2776135/903061) and then remove duplicates, [as here](https://stackoverflow.com/q/13967063/903061). – Gregor Thomas Oct 10 '17 at 14:31
  • Possible duplicate of [Push up and tighten Dataframe. General solution](https://stackoverflow.com/questions/46490524/push-up-and-tighten-dataframe-general-solution) – Andre Elrico Oct 10 '17 at 14:39

2 Answers2

2

By using zoo

library(zoo)
df[df==' '] <- NA
df[1:3] <- lapply(df[1:3], na.locf0, fromLast = FALSE)
df <- df[!is.na(df$V4),]
df

giving:

  V1 V2 V3 V4
4  A  B C1 D1
5  A  B C1 D2
7  A  B C2 D3
8  A  B C2 D4

or by using magrittr too we can write the above code in terms of this pipeline:

library(magrittr)
library(zoo)

df %>% 
   replace(. == ' ', NA) %>%
   replace(1:3, lapply(.[1:3], na.locf0, fromLast = FALSE)) %>%     
   subset(!is.na(V4))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
BENY
  • 317,841
  • 20
  • 164
  • 234
2

A solution using dplyr and tidyr. This solution follows the link in Gregor's comments. But instead of using zoo package, here I show the use of fill function from tidyr, na.omit from base R, and distinct function from dplyr.

library(dplyr)
library(tidyr)

dt2 <- dt %>%
  fill(everything(), .direction = "down") %>%
  na.omit() %>%
  distinct(V4, .keep_all = TRUE)
dt2
  V1 V2 V3 V4
1  A  B C1 D1
2  A  B C1 D2
3  A  B C2 D3
4  A  B C2 D4

DATA

dt <- read.table(text = "V1 V2 V3 V4
1  A NA NA NA         
                 2  NA  B NA NA      
                 3  NA  NA  C1 NA   
                 4  NA  NA  NA D1
                 5  NA  NA  NA D2
                 6  NA  NA  C2 NA   
                 7  NA  NA  NA D3
                 8  NA  NA  NA D4",
                 header = TRUE, stringsAsFactors = FALSE)
www
  • 38,575
  • 12
  • 48
  • 84