Questions tagged [dfply]

A Python package that mimics the R's dplyr-style data manipulation functionality.

20 questions
5
votes
1 answer

dfply: Mutating string column: TypeError

My pandas dataframe contains a column "file" which are strings with a file path. I am trying to use dfply to mutate this column like resultstatsDF.reset_index() >> mutate(dirfile =…
Make42
  • 12,236
  • 24
  • 79
  • 155
4
votes
3 answers

Converting R code into python code

working code in R library(dplyr) tmp <- test %>% group_by(InvoiceDocNumber) %>% summarise(invoiceprob=max(itemprob)) %>% mutate(invoicerank=rank(desc(invoiceprob))) But I want to rewrite the code in python. I wrote the…
vinay karagod
  • 256
  • 1
  • 3
  • 18
3
votes
1 answer

Python dfply package - Joins

Coming from R and trying to simulate dplyr with dfply package in Python. Need some help. I have two questions here and please help. How I join two datasets if those join columns have different names? Is there way I join more than column? As per…
Murali
  • 579
  • 1
  • 6
  • 20
2
votes
1 answer

Groupby a column and then compare two other columns and return a value in a different column

I have a dataframe similar to this data={'COMB':["PNR1", "PNR1", "PNR11", "PNR2", "PNR2"], 'FROM':["MAA", "BLR", "DEL", "TRV", "HYD"], 'TO':["BLR", "MAA", "MAA", "HYD", "TRV"]} md=pd.DataFrame(data) md What I want to do is to…
Prabhu
  • 87
  • 1
  • 10
1
vote
3 answers

Conditional dataframe manipulation by row

Say I have a df like and I want a df like this How would I do this in python or R? This would be so easy in excel with a simple if statement, for example: c5 =IF(c2 = "X", "ccc", c4). I thought this would be simple in R too, but I tried df <- df…
user276238
  • 107
  • 6
1
vote
1 answer

I'm getting memory address instead of values using dfply mutate + custom function

I'm trying out dfply as an alternative to Pandas apply and applymap. Given some fake data: import pandas as pd from dfply import * df = pd.DataFrame({'country':['taiwan','ireland','taiwan', 'ireland', 'china'], 'num':[10.00,…
Chuck
  • 1,061
  • 1
  • 20
  • 45
1
vote
1 answer

Error when creating a function using dfply @dfpipe

I have a dataset "banks" where if I do a groupby on a column name "jobs" to check counts in each category,I could find the…
1
vote
2 answers

group_by ModuleNotFoundError: No module named 'dfply.group'; 'dfply' is not a package

I am working on Spyder (Anaconda). I always have several error messages since I work on Windows. I have already tried this code in Linux and It worked ! from dfply import * worked very well. from dfply import * from dfply.group import group_by…
CiaPy
  • 25
  • 6
1
vote
1 answer

Create a column with ranges, Python

My data set is Churn_Modeling: I am looking to create a column called c_rating with the following ranges: (<500 -="very poor", 500-600="poor", 601-660="fair", 661-780="good", and >= 780 – "excellent"). Some example data: with columns in…
1
vote
2 answers

Python equivalent to dplyr's ifelse

I'm converting code from R to Python and am looking for some help with mutating a new column based on other columns, using dfply syntax/piping In this example, I want to subtract 2 from col1 if col2 is 'c', otherwise add 4 import pandas as pd import…
CoolGuyHasChillDay
  • 659
  • 1
  • 6
  • 21
1
vote
1 answer

How to use conditionnal statement with startswith() on Python - dfply?

I'm doing data wrangling on Python, using the package dfply. I want to create a new variable "a06", from 'FC06' of the dataset data_a, so that : a06 = 1 if FC06[i] starts with the character "1" (ex : FC06[i]=173) a06 = 2 if FC06[i] starts with the…
Elise1369
  • 259
  • 1
  • 6
  • 19
1
vote
2 answers

dfply - Python - X name is undefined

I'm using package dfply in python which mimics the package dplyr in R. This is the simple code I'm trying to run. I have this dataset 'data' previously loaded in my environment and I just want to group for that variable. import dfply as dp …
Marco Fumagalli
  • 2,307
  • 3
  • 23
  • 41
1
vote
0 answers

Python dfply Package join data by two columns

I have a need to join 2 datasets by 2 columns. Seems like no functionality in 'dfply' package. Am I thikning right? please help pat_active = (patients >> inner_join(active, by = ('StationID','PracticeID')) ) Documentation about…
Murali
  • 579
  • 1
  • 6
  • 20
1
vote
0 answers

dfply.mutate does not work with pandas.to_datetime

I have a DataFrame for which hub2['time'] = pd.to_datetime(hub2.timestamp) works, but when I write hub2 >> mutate(time=pd.to_datetime(X.timestamp)) with https://github.com/kieferk/dfply I get the error Traceback (most recent call last): File…
Make42
  • 12,236
  • 24
  • 79
  • 155
0
votes
2 answers

It looks like a List but I can't index into it: ValueError: Length of values (2) does not match length of index (279999)

I am importing the CSV file from here: https://raw.githubusercontent.com/kwartler/Harvard_DataMining_Business_Student/master/BookDataSets/LaptopSales.csv This code works: from dfply import * import pandas as pd df =…
nicomp
  • 4,344
  • 4
  • 27
  • 60
1
2