2

I am looking to convert data frame df1 to df2 using Python. I have a solution that uses loops but I am wondering if there is an easier way to create df2.

df1

   Test1   Test2   2014  2015  2016  Present
1     x        a     90    85    84        0
2     x      a:b     88    79    72        1
3     y    a:b:c     75    76    81        0
4     y        b     60    62    66        0
5     y        c     68    62    66        1

df2

   Test1  Test2   2014  2015  2016  Present
1     x       a     90    85    84        0
2     x       a     88    79    72        1
3     x       b     88    79    72        1
4     y       a     75    76    81        0
5     y       b     75    76    81        0
6     y       c     75    76    81        0
7     y       b     60    62    66        0
8     y       c     68    62    66        1
smci
  • 32,567
  • 20
  • 113
  • 146
arqchicago
  • 309
  • 1
  • 4
  • 13
  • Interesting question. However, please pick either R or Python as to what answer(s) you want for this question (eg: remove one of the tags). If you want to ask another for the other one - that's great. (It just gets awkward for others to find such techniques later when they're both combined into one). – Jon Clements Aug 08 '18 at 16:12
  • (It'd also help for your chosen language if you include your current solution) – Jon Clements Aug 08 '18 at 16:14
  • 1
    Check `separate_rows` function from `tidyr` – AntoniosK Aug 08 '18 at 16:16
  • Thanks. I changed the tags to Python. – arqchicago Aug 08 '18 at 16:17
  • Did it work? I am seeing only python tag now. – arqchicago Aug 08 '18 at 16:21
  • 2
    @arqchicago it's reopened but it'd still be a great help if you include your existing solution you're not happy with please. – Jon Clements Aug 08 '18 at 16:25
  • also of [Split (explode) pandas dataframe string entry to separate rows](https://stackoverflow.com/questions/12680754/split-explode-pandas-dataframe-string-entry-to-separate-rows) – smci Aug 08 '18 at 17:10

2 Answers2

1

Here's one way using numpy.repeat and itertools.chain:

import numpy as np
from itertools import chain

# split by delimiter and calculate length for each row
split = df['Test2'].str.split(':')
lens = split.map(len)

# repeat non-split columns
cols = ('Test1', '2014', '2015', '2016', 'Present')
d1 = {col: np.repeat(df[col], lens) for col in cols}

# chain split columns
d2 = {'Test2': list(chain.from_iterable(split))}

# combine in a single dataframe
res = pd.DataFrame({**d1, **d2})

print(res)

   2014  2015  2016  Present Test1 Test2
1    90    85    84        0     x     a
2    88    79    72        1     x     a
2    88    79    72        1     x     b
3    75    76    81        0     y     a
3    75    76    81        0     y     b
3    75    76    81        0     y     c
4    60    62    66        0     y     b
5    68    62    66        1     y     c
jpp
  • 159,742
  • 34
  • 281
  • 339
0

This will achieve what you want:

# Converting "Test2" strings into lists of values
df["Test2"] = df["Test2"].apply(lambda x: x.split(":"))

# Creating second dataframe with "Test2" values
test2 = df.apply(lambda x: pd.Series(x['Test2']),axis=1).stack().reset_index(level=1, drop=True)
test2.name = 'Test2'

# Joining both dataframes
df = df.drop('Test2', axis=1).join(test2)

print(df)

  Test1 2014 2015 2016 Present Test2
1     x   90   85   84       0     a
2     x   88   79   72       1     a
2     x   88   79   72       1     b
3     y   75   76   81       0     a
3     y   75   76   81       0     b
3     y   75   76   81       0     c
4     y   60   62   66       0     b
5     y   68   62   66       1     c

Similar questions (column already existing as a list): 1 2

iacob
  • 20,084
  • 6
  • 92
  • 119