Questions tagged [data-wrangling]

1242 questions
127
votes
8 answers

Good alternative to Pandas .append() method, now that it is being deprecated?

I use the following method a lot to append a single row to a dataframe. One thing I really like about it is that it allows you to append a simple dict object. For example: # Creating an empty dataframe df = pd.DataFrame(columns=['a', 'b']) #…
Glenn
  • 4,195
  • 9
  • 33
  • 41
17
votes
5 answers

How to swap the column and row entries in R

library(data.table) dat1 <- data.table(id = c(1, 2, 34, 99), class = c("sports", "", "music, sports", ""), hobby = c("knitting, music, sports", "", "", "music")) > dat1 id class hobby 1 1 …
Adrian
  • 9,229
  • 24
  • 74
  • 132
14
votes
1 answer

R: Changing column names in pivot_wider() -- suffix to prefix

I'm trying to figure out how to alter the way in which tidyr's pivot_wider() function creates new variable names in resulting wide data sets. Specifically, I would like the "names_from" variable to be added to the prefix of the new variables rather…
mkpcr
  • 431
  • 1
  • 3
  • 13
7
votes
3 answers

How to summarise a dataframe retaining all the columns

Consider the following dataframe: dummy_df <- tibble( A=c("ABC", "ABC", "BCD", "CDF", "CDF", "CDF"), B=c(0.25, 0.25, 1.23, 0.58, 0.58, 0.58), C=c("lorem", "ipsum", "dolor", "amet", "something", "else"), D=c("up", "up", "down", "down",…
jpm92
  • 143
  • 1
  • 8
5
votes
2 answers

Pandas: normalize values by group

I find it hard to explain with words what I want to achieve, so please don't judge me for showing a simple example instead. I have a table that looks like…
Max Skoryk
  • 404
  • 2
  • 10
5
votes
3 answers

New column based on values ​from other columns AND respecting pre-established rules

I'm looking for an algorithm to create a new column based on values ​​from other columns AND respecting pre-established rules. Here's an example: artificial data df = data.frame( col_1 =…
Henrique
  • 146
  • 7
5
votes
1 answer

Mutate across multiple columns using dplyr

I am trying to calculate rowwise averages for a number of columns. Could somebody please explain why the code below only calculates the mean for the two variables in the code (var_1 and var_13), rather than the mean for all 13 columns? df %>%…
EvieeG
  • 53
  • 3
5
votes
3 answers

Transforming complete age from character to numeric in R

I have a dataset with people's complete age as strings (e.g., "10 years 8 months 23 days) in R, and I need to transform it into a numeric variable that makes sense. I'm thinking about converting it to how many days of age the person has (which is…
Ruam Pimentel
  • 1,288
  • 4
  • 16
5
votes
3 answers

How do I create new columns based on the values of a different column and count the percentage value of another numerical column in R?

The sample data frame: no <- rep(1:5, each=2) type <- rep(LETTERS[1:2], times=5) set.seed(4) value <- round(runif(10, 10, 30)) df <- data.frame(no, type, value) df no type value 1 1 A 22 2 1 B 10 3 2 A 16 4 2 B …
Shibaprasadb
  • 1,307
  • 1
  • 7
  • 22
5
votes
1 answer

Julia. Summarise one column into a new DataFrame with multiple columns

I need to group a dataframe by one variable and then summarising it by adding the number or rows (I can already do this) and number of columns relative to .25, .5, .75 quantiles of another variable. In R I would do e.g.: iris %>% …
Bakaburg
  • 3,165
  • 4
  • 32
  • 64
5
votes
3 answers

Create new dataframe by dividing all possibles columns combination from another table

I'm struggling to find an easy a fast solution to create a new data frame by multiplying all "group" of columns between them. Data for example a1 <- rnorm(n = 10) b1 <- rnorm(n = 10) c1 <- rnorm(n = 10) a2 <- rnorm(n = 10) b2 <- rnorm(n = 10) c2 <-…
Ian.T
  • 1,016
  • 1
  • 9
  • 19
5
votes
4 answers

How to write an efficient wrapper for data wrangling, allowing to turn off any wrapped part when calling the wrapper

To streamline data wrangling, I write a wrapper function consisted of several "verb functions" that process the data. Each one performs one task on the data. However, not all tasks are applicable to all datasets that pass through this process, and…
Emman
  • 3,695
  • 2
  • 20
  • 44
5
votes
1 answer

Data manipulation in Pandas: create a boolean column from values on column then fill with value from yet another column

ok, I've been trying this for too long, time to ask for help. I have a dataframe that looks a bit like this: person fruit quantity all_fruits 0 p1 grapes 2 [grapes, banana] 1 p1 banana 1 [grapes, banana] 2 p2…
4
votes
3 answers

Tidyverse column-wise differences

Suppose I have a data frame like this: df = data.frame(preA = c(1,2,3),preB = c(3,4,5),postA = c(6,7,8),postB = c(9,8,4)) I want to add columns having column-wise differences, that is: diffA = postA - preA diffB = postB - preB and so on... Is…
Ravi
  • 41
  • 3
4
votes
2 answers

Remove rows of a certain value, before values change in R

I have a data frame like the following: dat <- data.frame(Target = c(rep("01", times = 8), rep("02", times = 5), rep("03", times = 4)), targ2clicks = c(1, 1, 1, 1, 0, 0 ,0 , 1, 1, 0, 0, 0, 1, …
1
2 3
82 83