Questions tagged [data-munging]

The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent useable form for further processing or input to an algorithm or system.

236 questions
127
votes
8 answers

Good alternative to Pandas .append() method, now that it is being deprecated?

I use the following method a lot to append a single row to a dataframe. One thing I really like about it is that it allows you to append a simple dict object. For example: # Creating an empty dataframe df = pd.DataFrame(columns=['a', 'b']) #…
Glenn
  • 4,195
  • 9
  • 33
  • 41
93
votes
3 answers

Pandas merge two dataframes with different columns

I'm surely missing something simple here. Trying to merge two dataframes in pandas that have mostly the same column names, but the right dataframe has some columns that the left doesn't have, and vice versa. >df_may id quantity attr_1 attr_2 0…
economy
  • 4,035
  • 6
  • 29
  • 37
40
votes
10 answers

Strip white spaces from CSV file

I need to stripe the white spaces from a CSV file that I read import csv aList=[] with open(self.filename, 'r') as f: reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE) for row in reader: aList.append(row) # I need…
BAI
  • 571
  • 2
  • 6
  • 9
36
votes
6 answers

How to convert a python datetime.datetime to excel serial date number

I need to convert dates into Excel serial numbers for a data munging script I am writing. By playing with dates in my OpenOffice Calc workbook, I was able to deduce that '1-Jan 1899 00:00:00' maps to the number zero. I wrote the following function…
Homunculus Reticulli
  • 65,167
  • 81
  • 216
  • 341
27
votes
10 answers

Python: in-memory object database which supports indexing?

I'm doing some data munging which would be quite a bit simpler if I could stick a bunch of dictionaries in an in-memory database, then run simply queries against it. For example, something like: people = db([ {"name": "Joe", "age": 16}, …
David Wolever
  • 148,955
  • 89
  • 346
  • 502
13
votes
5 answers

pandas copy value from one column to another if condition is met

I have a dataframe: df = col1 col2 col3 1 2 3 1 4 6 3 7 2 I want to edit df, such that when the value of col1 is smaller than 2 , take the value from col3. So I will get: new_df = col1 col2 col3 3 2 3 6 …
Cranjis
  • 1,590
  • 8
  • 31
  • 64
13
votes
2 answers

How to move my pandas dataframe to d3?

I am new to Python and have worked my way through a few books on it. Everything is great, except visualizations. I really dislike matplotlib and Bokeh requires too heavy of a stack. The workflow I want is: Data munging analysis using pandas in…
Anton
  • 4,765
  • 12
  • 36
  • 50
11
votes
2 answers

Which Perl modules for good for data munging?

Nine years ago when I started to parsing HTML and free text with Perl I read the classic Data Munging with Perl. Does someone know if David is planning to update the book or if there are similar books or web pages where the new parsing modules like…
9
votes
2 answers

rm() function of r alternative in python

How to remove the variables in python to clear ram memory in python? R : a = 2 rm(a) Python: a = 2 How to clear the single variables or a group of variables?
koneru nikhil
  • 339
  • 2
  • 12
6
votes
4 answers

openxlsx::write.xlsx overwriting existing worksheet instead append

The openxlsx::write.xlsx function is overwriting spreadsheet instead of adding another tab. I tried do follow some orientations of Stackoverflow, but without sucess. dt.escrita <- format(Sys.time(), '%Y%m%d%H%M%S') write.xlsx( tbl.messages …
Rafael Lima
  • 420
  • 1
  • 5
  • 16
6
votes
1 answer

Unexpected results of min() and max() methods of Pandas series made of Timestamp objects

I encountered this behaviour when doing basic data munging, like in this example: In [55]: import pandas as pd In [56]: import numpy as np In [57]: rng = pd.date_range('1/1/2000', periods=10, freq='4h') In [58]: lvls =…
LukaszJ
  • 145
  • 2
  • 6
5
votes
3 answers

scripting with C#?

I have used Python extensively for doing various adhoc data munging and ancillary tasks. Since I am learning C#, I figure it would be fun to see if I can rewrite some of these scripts in C#. Is there an executable available that takes a .cs file and…
voidstar
  • 161
  • 1
  • 4
5
votes
5 answers

melt column by substring of the columns name in pandas (python)

I have dataframe: subject A_target_word_gd A_target_word_fd B_target_word_gd B_target_word_fd subject_type 1 1 2 3 4 mild 2 …
Cranjis
  • 1,590
  • 8
  • 31
  • 64
5
votes
6 answers

How to do a sort of mixed values in R

I have a data frame that I want to sort by one column than the next, (using tidyverse if possible). I checked the below address but the solutions did not seem to work. Order a "mixed" vector (numbers with letters) Sample code for an…
Jordan
  • 1,415
  • 3
  • 18
  • 44
5
votes
2 answers

Data munging in pandas

I have a CSV file with lines look like: ID,98.4,100M,55M,65M,75M,100M,75M,65M,100M,98M,100M,100M,92M,0#,0N#, I can read it in with #!/usr/bin/env python import pandas as pd import sys filename = sys.argv[1] df = pd.read_csv(filename) Given a…
Simd
  • 19,447
  • 42
  • 136
  • 271
1
2 3
15 16