Questions tagged [data-analysis]

Data Analysis involves extracting meaning and insights from raw data. It involves methods and algorithms that examine, clean, transform and model the data to obtain conclusions.

Data Analysis involves extracting meaning and insights from raw data.

It involves methods and algorithms that examine, clean, transform and model the data to obtain conclusions and insights.

Typically, data analysis involves a series of steps. Starting with measuring some parameters of interest, collecting the data, cleaning it, storing it in meaningful ways, then summarizing and examining it, and also testing various hyoptheses about the data.

More information can be found the Wikipedia's Data Analysis page.

4642 questions
476
votes
4 answers

How to sort a dataFrame in python pandas by two or more columns?

Suppose I have a dataframe with columns a, b and c, I want to sort the dataframe by column b in ascending order, and by column c in descending order, how do I do this?
Rakesh Adhikesavan
  • 11,966
  • 18
  • 51
  • 76
395
votes
40 answers

Peak signal detection in realtime timeseries data

Update: The best performing algorithm so far is this one. This question explores robust algorithms for detecting sudden peaks in real-time timeseries data. Consider the following example data: Example of this data is in Matlab format (but this…
179
votes
13 answers

How to merge multiple dataframes

I have different dataframes and need to merge them together based on the date column. If I only had two dataframes, I could use df1.merge(df2, on='date'), to do it with three dataframes, I use df1.merge(df2.merge(df3, on='date'), on='date'), however…
Vasco Ferreira
  • 2,151
  • 2
  • 15
  • 21
132
votes
3 answers

Why does one hot encoding improve machine learning performance?

I have noticed that when One Hot encoding is used on a particular data set (a matrix) and used as training data for learning algorithms, it gives significantly better results with respect to prediction accuracy, compared to using the original matrix…
maheshakya
  • 2,198
  • 7
  • 28
  • 43
95
votes
8 answers

How do I change a single index value in pandas dataframe?

energy.loc['Republic of Korea'] I want to change the value of index from 'Republic of Korea' to 'South Korea'. But the dataframe is too large and it is not possible to change every index value. How do I change only this single value?
user517696
  • 2,472
  • 7
  • 24
  • 35
93
votes
5 answers

Fitting polynomial model to data in R

I've read the answers to this question and they are quite helpful, but I need help. I have an example data set in R as follows: x <- c(32,64,96,118,126,144,152.5,158) y <- c(99.5,104.8,108.5,100,86,64,35.3,15) I want to fit a model to these data…
Mehper C. Palavuzlar
  • 10,089
  • 23
  • 56
  • 69
93
votes
3 answers

How do I sum values in a column that match a given condition using pandas?

Suppose I have a dataframe like so: a b 1 5 1 7 2 3 1 3 2 5 I want to sum up the values for b where a = 1, for example. This would give me 5 + 7 + 3 = 15. How do I do this in pandas?
adijo
  • 1,450
  • 1
  • 14
  • 20
46
votes
6 answers

How to get rid of multilevel index after using pivot table pandas?

I had following data frame (the real data frame is much more larger than this one ) : sale_user_id sale_product_id count 1 1 1 1 8 1 1 52 1 1 …
chessosapiens
  • 3,159
  • 10
  • 36
  • 58
42
votes
3 answers

Group by two columns and count the occurrences of each combination in Pandas

I have the following data frame: data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']}) product_id user_id p1 a1 p1 a1 p2 …
chessosapiens
  • 3,159
  • 10
  • 36
  • 58
39
votes
1 answer

Plotting results of Pandas GroupBy

I'm starting to learn Pandas and am trying to find the most Pythonic (or panda-thonic?) ways to do certain tasks. Suppose we have a DataFrame with columns A, B, and C. Column A contains boolean values: each row's A value is either true or…
Maxim Zaslavsky
  • 17,787
  • 30
  • 107
  • 173
36
votes
12 answers

R and SPSS difference

I will be analysing vast amount of network traffic related data shortly, and will pre-process the data in order to analyse it. I have found that R and SPSS are among the most popular tools for statistical analysis. I will also be generating quite a…
sfactor
  • 12,592
  • 32
  • 102
  • 152
32
votes
4 answers

Plot pandas dataframe containing NaNs

I have GPS data of ice speed from three different GPS receivers. The data are in a pandas dataframe with an index of julian day (incremental from the start of 2009). This is a subset of the data (the main dataset is 3487235 rows...): …
ajt
  • 542
  • 2
  • 6
  • 12
31
votes
2 answers

how to get rid of pandas converting large numbers in excel sheet to exponential?

In the excel sheet , i have two columns with large numbers. But when i read the excel file with read_excel() and display the dataframe, those two columns are printed in scientific format with exponential. How can get rid of this format? Thanks…
Nathaniel Babalola
  • 617
  • 2
  • 6
  • 15
30
votes
5 answers

python pandas: how to calculate derivative/gradient

Given that I have the following two vectors: In [99]: time_index Out[99]: [1484942413, 1484942712, 1484943012, 1484943312, 1484943612, 1484943912, 1484944212, 1484944511, 1484944811, 1484945110] In [100]: bytes_in Out[100]:…
nskalis
  • 2,232
  • 8
  • 30
  • 49
30
votes
3 answers

How to find the closest word to a vector using word2vec

I have just started using Word2vec and I was wondering how can we find the closest word to a vector suppose. I have this vector which is the average vector for a set of vectors: array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32) Is…
sel
  • 942
  • 1
  • 12
  • 25
1
2 3
99 100