-2
tweets1$index <- 1:nrow(tweets1)
tweets2 <- tweets1 %>% pivot_wider(names_from = "X1", values_from = "X2")
tweets2

This code creates this data frame:

Image

I obviously don't want the NA values, and I'd like to have rows 1-3 combined into 1 and so on. Essentially I'd like to group every three rows together. Can I get some advice with this?

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    `tweets1$index <- rep(1:(nrow(tweets1)/3), each = 3)`? – Jon Spring Mar 25 '23 at 20:12
  • Doesn't work...I think the division part is necessary though – obothehobo Mar 25 '23 at 20:18
  • 1
    Without any sample data (as code, eg using the very handy `dput()` function), we will just be guessing what works for your data. Also, "doesn't work" is ambiguous -- did you get an error, or a different result that was wrong, or the same result as before? – Jon Spring Mar 25 '23 at 22:05
  • obothehobo, JonSpring requested that you provide sample data using `dput`, and since neither of our answers apparently resolves your issue, I suggest you give more thought to giving us sample data. See https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info for how to update your question to make it clearer, both from an input and an expect-output perspective. – r2evans Mar 26 '23 at 13:19
  • Sorry for the confusion, this is my first post here so not too familiar. Thanks JonSpring for the help, that code ended up working once I deleted the last row of the data frame which included a "cutoff" tweet. After doing that, the number of rows was divisible by three so your suggested code worked. Thank you! – obothehobo Mar 26 '23 at 21:41
  • I made the assumption from the question that the OP had complete data with three entries per group and nothing extra. I edited my answer to show to approaches that could manage if that's not the case. 1) integer division would be robust to the last group having under 3 entries 2) counting cumulative `a` entries would be robust to groups missing a `b` or `c`. – Jon Spring Mar 28 '23 at 16:43

2 Answers2

0

Stemming off of:

  • "don't want the NA values", and
  • "rows 1-3 combined into 1 and so on"

Perhaps this will work:

dat <- data.frame(id=1:6, T=c("2009-06-11",NA,NA,"2009-06-11",NA,NA), U=c(NA,"https://www1",NA,NA,"https://www2",NA),W=c(NA,NA,"post1",NA,NA,"post2"))
dat
#   id          T            U     W
# 1  1 2009-06-11         <NA>  <NA>
# 2  2       <NA> https://www1  <NA>
# 3  3       <NA>         <NA> post1
# 4  4 2009-06-11         <NA>  <NA>
# 5  5       <NA> https://www2  <NA>
# 6  6       <NA>         <NA> post2

by(
  dat, cumsum(!is.na(dat$T)),
  function(Z) data.frame(lapply(Z[-1], function(x) paste(na.omit(x), collapse=";")[1]))
) |>
  do.call(rbind, args = _)
#            T            U     W
# 1 2009-06-11 https://www1 post1
# 2 2009-06-11 https://www2 post2

(|> and _ requires R >= 4.2.0, this can be made to work with older R without difficulty.)

r2evans
  • 141,215
  • 6
  • 77
  • 149
0

Assuming your original dataframe looks like this:

  X1    X2
1  a  blip
2  b  blap
3  c blorp
4  a  slip
5  b  slop
6  c slorp

EDIT: Old approach required 3n rows.

In case the last group is partial or unrelated to the other data, we can use integer division to assign groups of three, with the last group potentially having between 1-3 members.

df$index = (1:nrow(df)-1) %/% 3

In case other groups might be partial, but the data is in order and every group has a unique a, we could count a new group as being the appearance of non-NA data in a:

df$index = cumsum(!is.na(df$a))

Producing

  X1    X2 index
1  a  blip     1
2  b  blap     1
3  c blorp     1
4  a  slip     2
5  b  slop     2
6  c slorp     2

So that we can do:

df %>% pivot_wider(names_from = "X1", values_from = "X2")


# A tibble: 2 × 4
  index a     b     c    
  <int> <chr> <chr> <chr>
1     1 blip  blap  blorp
2     2 slip  slop  slorp
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • (But the fact that this "doesn't work" suggests there is more to your question/data than what I can divine from your screenshot.) – Jon Spring Mar 25 '23 at 22:13
  • Edited to deal with potentially incomplete data for the last group, or incomplete data in other groups provided that all groups have an `a` value. – Jon Spring Mar 28 '23 at 16:35