0

I have this type of data I want this each list of each id in seperate column

id       data
2        [1.81744912347, 1.96313966807, 1.79290908923]
3        [0.87738744314, 0.154642653196, 0.319845728764]
4        [1.12289279512, 1.16105905267, 1.14889626137]
5        [1.65093687407, 1.65010263863, 1.65614839538]
6        [0.103623262651, 0.46093367049, 0.549343505693]
7        [0.122299243819, 0.355964399805, 0.40010681636]
8        [3.08321032223, 2.92526466342, 2.6504125359, 2]
9        [0.287041436848, 0.264107869667, 0.29319302508]
10       [0.673829091668, 0.632715325748, 0.47099544284]
11       [3.04589375431, 2.19130582148, 1.68173686657]

how can I transform the data into the pandas DataFrame I want it as the following data

id   data
1   1.61567967235
1   1.55256213176
1   1.16904355984
...
10  0.673829091668
10  0.632715325748

and so on

its large amount of data, if I use the loop to transform it, it kills the notebook, is there any other way to process this data,

sample image of the data enter image description here

id101112
  • 1,012
  • 2
  • 16
  • 28
  • @Wen its not duplicate question, if you see the question requirements that's completely a seperate what my question is – id101112 Jul 23 '18 at 03:50
  • It is still al most the same , look at my question , i convert that to list like you show in your picture , the unnesting it – BENY Jul 23 '18 at 03:52

1 Answers1

2

IIUC, from

col
0   [1, 2, 3]
1   [4, 5, 6]

can do

df.col.apply(pd.Series).stack().reset_index(drop=True)

0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64

or

pd.Series([z for x in df.col.values for z in x])

0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64

Times:

%timeit df.col.apply(pd.Series).stack().reset_index(drop=True)
1.15 ms ± 26.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit pd.Series([z for x in df.col.values for z in x])
89.2 µs ± 2.58 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • Thank you so much, this solution worked for me.. – id101112 Jul 23 '18 at 04:06
  • I have a question, this method drops all the nan values, which i don't want to, how can I do that ?? – id101112 Jul 24 '18 at 15:10
  • @id101112 just use `stack(dropna=False)` ;) – rafaelc Jul 24 '18 at 15:12
  • this is not working, this is adding extra Nan columns in the dataframe, issue not resolved yet @RafaelC – id101112 Jul 24 '18 at 15:30
  • @id101112 what about the second method? `pd.Series([z for x in df.col.values for z in x])` ? – rafaelc Jul 24 '18 at 15:31
  • yes, this method works fine, but the thing is i need those repetitive ids also, which are shown as my required output, and i only get those if I do df.index, i am not sure with loop,. how can I get those repetitive ids here, so the first method was more better to apply – id101112 Jul 24 '18 at 15:48
  • 1
    @RafaeIC thanks, i figured out, and it worked for me. – id101112 Jul 24 '18 at 16:08