0

I am not sure on how to name my problem and therefore was unable to search for it properly.

The problem I have is, that I converted a json-txt-file in R into a dataframe but the values of my variables a scattered all about resulting in 37200 observations with lots of empty cells (NA) where there should only be 600.

It looks something like this:

   Var1    Var2   Var3  Var4   Var5   Var6   ID
1   4       NA     NA    NA     NA     NA    1
2   NA      3      2     NA     NA     NA    1
3   NA      NA     NA    1      8      NA    1
4   NA      NA     NA    NA     NA     7     1

What I want however, is the following:

    Var1    Var2   Var3  Var4   Var5   Var6   ID
1   4       3      2     1       8     7      1

So I don‘t know if this is a classic long-wide format problem or if its different (I actually have all the variables/columns I want already) Searching the former didn’t really bring me a propper solution. I had a somewhat similar problem once that I solved with creating lagged variables - however, in this dataset I have 146 variables, so I was wondering if there’s an easier solution?

This question : NA values and extra rows when spreading repeated measures of multiple variables into wide format? goes a bit in the direction of my problem, however with 146 variables I still hope there might be a better solution.

Jessi
  • 5
  • 2
  • 1
    I suspect improving the import from JSON will do more good than trying to fix it afterwards - is it possible to try and address that? What's the structure of the JSON? What code did you use to import it? – Paul Stafford Allen Jan 06 '23 at 08:12
  • Try this with `dplyr` `df %>% group_by(ID) %>% summarise_all(funs(.[!is.na(.)][1]))` – Paul Stafford Allen Jan 06 '23 at 08:18
  • Plenty of ways here https://stackoverflow.com/questions/45515218/combine-rows-in-data-frame-containing-na-to-make-complete-row – harre Jan 06 '23 at 08:22

1 Answers1

0
library(tidyverse)

df %>% 
  pivot_longer(-ID) %>% 
  drop_na() %>%  
  pivot_wider(names_from = name, 
              values_from = value)

# A tibble: 1 × 7
     ID  Var1  Var2  Var3  Var4  Var5  Var6
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     4     3     2     1     8     7
Chamkrai
  • 5,912
  • 1
  • 4
  • 14