2

I've been using R for data analysis and am trying to learn python. In R, I can create vectors with c(), which gives me back a "column" resulting from whatever I pass it. I often use it to concatenate sequences or repeated values. Something like this:

> test <- c(rep(1:2, each = 2), seq(5, 10, by = 2), runif(3))
> test
 [1] 1.0000000 1.0000000 2.0000000 2.0000000 5.0000000 7.0000000 9.0000000
 [8] 0.9237168 0.5051230 0.2367923

What is the pythonic way to do this (guessing with pandas or numpy)?


This question is the closest I've found, but it's only putting together range() objects. Trying to do the above in python, storing the output as a pd.Series, I tried:

import numpy as np
import pandas as pd

test = pd.Series([np.repeat([1, 2], 2), 
                  np.arange(5, 10, 2),
                  np.random.random_sample(3)])

That gets me a sort of nested thing:

0                                        [1, 1, 2, 2]
1                                           [5, 7, 9]
2    [0.989736164378, 0.558979301843, 0.385354683044]
dtype: object

I see that I could flatten the list manually but that seems like overkill. I magically googled onto this question which contained the potentially helpful tolist() function which I hadn't heard of. While that's about getting some row of dataframes (??) into a pd.Series, the function seems like it might do the trick?

Combining that I can use + to add lists (gleaned from the first linked question), and the tolist() bit from the last one, I found this:

test1 = np.repeat([1, 2], 2).tolist()
test2 = np.arange(5, 10, 2).tolist()
test3 = np.random.random_sample(3).tolist()

test = pd.Series(test1 + test2 + test3)

0    1.000000
1    1.000000
2    2.000000
3    2.000000
4    5.000000
5    7.000000
6    9.000000
7    0.472650
8    0.077398
9    0.672734
dtype: float64

Hopefully what I'm trying to do is clear. I like that with c(), you pass in whatever you want and can elegantly string together a series of generated numbers in a desired pattern. I was surprised by how tough it was to do this with a pd.Series and infer from that I'm doing it wrong!

How is this typically done with python?

Psidom
  • 209,562
  • 33
  • 339
  • 356
Hendy
  • 10,182
  • 15
  • 65
  • 71

1 Answers1

5

If you start with numpy arrays, you can use numpy.concatenate:

pd.np.concatenate([np.repeat([1, 2], 2), np.arange(5, 10, 2), np.random.random_sample(3)])
#array([ 1.        ,  1.        ,  2.        ,  2.        ,  5.        ,
#        7.        ,  9.        ,  0.61116272,  0.48863116,  0.84436643])

If you start with pandas.Series objects, you can append one series to another:

s1 = pd.Series(np.repeat([1, 2], 2))
s2 = pd.Series(np.arange(5, 10, 2))
s3 = pd.Series(np.random.random_sample(3))
​    
s1.append([s2, s3], ignore_index=True)
#0    1.000000
#1    1.000000
#2    2.000000
#3    2.000000
#4    5.000000
#5    7.000000
#6    9.000000
#7    0.766968
#8    0.730897
#9    0.196995
#dtype: float64

or use pd.concat method:

pd.concat([s1, s2, s3], ignore_index=True)
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Thanks for this. Not quite as short as I'd have hoped, but this is helpful. Did you mean to have `pd.np.concatenate(...)`? I wondered if you might have meant `pd.Series(np.concatenate(...))`? – Hendy Aug 28 '17 at 03:15
  • Sure this can't beat `c()` :). If you want a *Series* object as result, then you need wrap `pd.Series` as `pd.Series(np.concatenate(...))`. `pd.np.concatenate` is also a valid syntax, which avoids you from importing `numpy` again if you have already imported `pandas`. – Psidom Aug 28 '17 at 03:19
  • 1
    Good to know. I likely don't really know if I need a series or not as I'm so new. I was feeding this back into a `pd.DataFrame` as a new column, so I'm assuming so. In either case, this was great to work through! – Hendy Aug 28 '17 at 03:34