A Python package that mimics the R's dplyr-style data manipulation functionality.
Questions tagged [dfply]
20 questions
5
votes
1 answer
dfply: Mutating string column: TypeError
My pandas dataframe contains a column "file" which are strings with a file path. I am trying to use dfply to mutate this column like
resultstatsDF.reset_index() >> mutate(dirfile =…

Make42
- 12,236
- 24
- 79
- 155
4
votes
3 answers
Converting R code into python code
working code in R
library(dplyr)
tmp <- test %>%
group_by(InvoiceDocNumber) %>%
summarise(invoiceprob=max(itemprob)) %>%
mutate(invoicerank=rank(desc(invoiceprob)))
But I want to rewrite the code in python. I wrote the…

vinay karagod
- 256
- 1
- 3
- 18
3
votes
1 answer
Python dfply package - Joins
Coming from R and trying to simulate dplyr with dfply package in Python. Need some help.
I have two questions here and please help.
How I join two datasets if those join columns have different names?
Is there way I join more than column? As per…

Murali
- 579
- 1
- 6
- 20
2
votes
1 answer
Groupby a column and then compare two other columns and return a value in a different column
I have a dataframe similar to this
data={'COMB':["PNR1", "PNR1", "PNR11", "PNR2", "PNR2"],
'FROM':["MAA", "BLR", "DEL", "TRV", "HYD"],
'TO':["BLR", "MAA", "MAA", "HYD", "TRV"]}
md=pd.DataFrame(data)
md
What I want to do is to…

Prabhu
- 87
- 1
- 10
1
vote
3 answers
Conditional dataframe manipulation by row
Say I have a df like
and I want a df like this
How would I do this in python or R? This would be so easy in excel with a simple if statement, for example: c5 =IF(c2 = "X", "ccc", c4).
I thought this would be simple in R too, but I tried
df <- df…

user276238
- 107
- 6
1
vote
1 answer
I'm getting memory address instead of values using dfply mutate + custom function
I'm trying out dfply as an alternative to Pandas apply and applymap. Given some fake data:
import pandas as pd
from dfply import *
df = pd.DataFrame({'country':['taiwan','ireland','taiwan', 'ireland', 'china'],
'num':[10.00,…

Chuck
- 1,061
- 1
- 20
- 45
1
vote
1 answer
Error when creating a function using dfply @dfpipe
I have a dataset "banks" where if I do a groupby on a column name "jobs" to check counts in each category,I could find the…

Shameek Mukherjee
- 13
- 3
1
vote
2 answers
group_by ModuleNotFoundError: No module named 'dfply.group'; 'dfply' is not a package
I am working on Spyder (Anaconda). I always have several error messages since I work on Windows. I have already tried this code in Linux and It worked ! from dfply import * worked very well.
from dfply import *
from dfply.group import group_by…

CiaPy
- 25
- 6
1
vote
1 answer
Create a column with ranges, Python
My data set is Churn_Modeling:
I am looking to create a column called c_rating with the following ranges: (<500 -="very poor", 500-600="poor", 601-660="fair", 661-780="good", and >= 780 – "excellent").
Some example data: with columns in…

Alexandra McGill
- 21
- 5
1
vote
2 answers
Python equivalent to dplyr's ifelse
I'm converting code from R to Python and am looking for some help with mutating a new column based on other columns, using dfply syntax/piping
In this example, I want to subtract 2 from col1 if col2 is 'c', otherwise add 4
import pandas as pd
import…

CoolGuyHasChillDay
- 659
- 1
- 6
- 21
1
vote
1 answer
How to use conditionnal statement with startswith() on Python - dfply?
I'm doing data wrangling on Python, using the package dfply.
I want to create a new variable "a06", from 'FC06' of the dataset data_a, so that :
a06 = 1 if FC06[i] starts with the character "1" (ex : FC06[i]=173)
a06 = 2 if FC06[i] starts with the…

Elise1369
- 259
- 1
- 6
- 19
1
vote
2 answers
dfply - Python - X name is undefined
I'm using package dfply in python which mimics the package dplyr in R.
This is the simple code I'm trying to run. I have this dataset 'data' previously loaded in my environment and I just want to group for that variable.
import dfply as dp
…

Marco Fumagalli
- 2,307
- 3
- 23
- 41
1
vote
0 answers
Python dfply Package join data by two columns
I have a need to join 2 datasets by 2 columns. Seems like no functionality in 'dfply' package. Am I thikning right? please help
pat_active = (patients >>
inner_join(active, by = ('StationID','PracticeID'))
)
Documentation about…

Murali
- 579
- 1
- 6
- 20
1
vote
0 answers
dfply.mutate does not work with pandas.to_datetime
I have a DataFrame for which
hub2['time'] = pd.to_datetime(hub2.timestamp)
works, but when I write
hub2 >> mutate(time=pd.to_datetime(X.timestamp))
with https://github.com/kieferk/dfply
I get the error
Traceback (most recent call last):
File…

Make42
- 12,236
- 24
- 79
- 155
0
votes
2 answers
It looks like a List but I can't index into it: ValueError: Length of values (2) does not match length of index (279999)
I am importing the CSV file from here: https://raw.githubusercontent.com/kwartler/Harvard_DataMining_Business_Student/master/BookDataSets/LaptopSales.csv
This code works:
from dfply import *
import pandas as pd
df =…

nicomp
- 4,344
- 4
- 27
- 60