2

I have a column in an excel which contains a mix of First Names, Last Names and Job titles. Only pattern that can be observed is - in each set of 3 rows, every 1st row is first name, 2nd row is last name and 3rd row is job title. I want to create 3 different columns and and segregate this data Sample data:

John
Bush
Manager
Katrina
Cohn
Secretary 

I want: John , Bush , Manager as one row going in three different columns under First Name, Last name and Job title respectively. Like -

First Name   Last Name    Job Title
John         Bush         Manager
Katrina      Cohn         Secretary 

How can we achieve this task?

ComplexData
  • 1,091
  • 4
  • 19
  • 36

2 Answers2

3

You can use this notation to get every third element with different starting points.

l = ['John', 'Bush', 'Manager', 'Katrina', 'Cohn', 'Secretary']

pd.DataFrame({'First Name': l[::3], 'Last Name': l[1::3], 'Job Title': l[2::3]})

outputs

  First Name  Job Title Last Name
0       John    Manager      Bush
1    Katrina  Secretary      Cohn
Alex
  • 18,484
  • 8
  • 60
  • 80
0
s = pd.Series([
        'John',
        'Bush',
        'Manager',
        'Katrina',
        'Cohn',
        'Secretary'])

df = pd.DataFrame(s.values.reshape(-1, 3),
                  columns=['First Name', 'Last Name', 'Job Title'])

df

enter image description here


If your length of your data isn't a multiple of 3 then you can force it like this:

s = pd.Series([
        'John',
        'Bush',
        'Manager',
        'Katrina',
        'Cohn',
        'Secretary',
        'Bogus'])

s_ = s.iloc[:s.shape[0] // 3 * 3]
df = pd.DataFrame(s_.values.reshape(-1, 3), columns=['First Name', 'Last Name', 'Job Title'])

df

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • ValueError: total size of new array must be unchanged – ComplexData Aug 04 '16 at 23:20
  • @user6461192 that means your series isn't of a length that is a multiple of 3. See edited post. – piRSquared Aug 04 '16 at 23:21
  • Thanks that worked. What can we do if we need to take an excel as an input, instead go hard coding the data in Series – ComplexData Aug 04 '16 at 23:26
  • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html http://stackoverflow.com/questions/26521266/using-pandas-to-pd-read-excel-for-multiple-worksheets-of-the-same-workbook – piRSquared Aug 04 '16 at 23:28