0

I just started learning PYTHON. I tried to search an answer for my problem but didn't have luck.

I have an excel file with multiple columns.

For example, this is what I have in the Excel file.

Current Data Set

and I would like to change the file to look like below. I used "Text to Columns" on Excel to do this(highlighted in yellow), but couldn't figure out how to do it using Python without affecting other columns.

Desired outcome

I would greatly appreciate your help!

Best, Tae

Tae Yoon
  • 19
  • 3
  • 1
    Have you heard of Pandas http://pandas.pydata.org/? – David Oct 30 '18 at 03:05
  • @David I've tried split using Pandas, but couldn't make it work. – Tae Yoon Oct 30 '18 at 03:07
  • 1
    your solution should be `segg = lambda x: pd.Series([i for i in reversed(x.split('-'))])` then `df['Information'].apply(segg)` – Karn Kumar Oct 30 '18 at 03:25
  • @pygo Somehow, I am still getting an error (some messages are: Key Error: 'Information', # get column, #duplicate columns & possible reduce dimensionality) – Tae Yoon Oct 30 '18 at 03:45
  • @TaeYoon, can you point the CSV file somewhere to have a look or reproduced the results because its running for my example. – Karn Kumar Oct 30 '18 at 03:47
  • @pygo of course! Thank you so much for your help! It would be great if the code doesn't affect the other numeric columns -- uploaded to Dropbox: https://www.dropbox.com/s/43ds9en9hnujffw/Example%20for%20Pygo.xlsx?dl=0 – Tae Yoon Oct 30 '18 at 03:52
  • @TaeYoon, done.. what you have to do it Just use `pd.read_excel`, see the ans. – Karn Kumar Oct 30 '18 at 04:24

4 Answers4

1

This should go something like below:

data['a'], data['col2'] = data['Information'].str.split('-', 1).str
data['b'], data['col3'] = data['col2'].str.split('-', 1).str
data['c'], data['col4'] = data['col3'].str.split('-', 1).str
data['d'], data['e'] = data['col4'].str.split('-', 1).str

This may not be the efficient way but will work for sure. This will spilt col Information in 5 different columns

Updated answer as per updated data in question

data = pd.read_excel("/path/to/file/Example for Pygo.xlsx")
data['a'], data['col2'] = data['Information'].str.split('-', 1).str
data['b'], data['col3'] = data['col2'].str.split('-', 1).str
data['c'], data['col4'] = data['col3'].str.split('-', 1).str
data['d'], data['e'] = data['col4'].str.split('-', 1).str
data = data.drop(['Information','col2', 'col3', 'col4'], axis = 1)
Upasana Mittal
  • 2,480
  • 1
  • 14
  • 19
  • Somehow, I am still getting an error (some messages are: Key Error: 'Information', # get column, #duplicate columns & possible reduce dimensionality) – Tae Yoon Oct 30 '18 at 03:46
  • @TaeYoon Please check. I have updated answer. Do remember to install package `xlrd` if not downloaded before with `pip install xlrd` in virtualenv. – Upasana Mittal Oct 30 '18 at 04:59
0

Check out the string.split() method. You can pass in an argument to split on, in this case string.split('-')

array[index]=array[index].split('-')
samlli
  • 106
  • 5
0

one easy way is to use dataframe to process the dataset. 1. read the xls file into dataframe using, you may find the details here xls into dataframe

  1. Now use merge, lambda and split.

please find examples below.

Example - 2 lines only

import pandas as pd

df = pd.read_excel(open('/Users/xxx/Downloads/ExampleforPygo.xlsx','rb'), sheet_name=0)
df = df.merge(df.apply(lambda row: pd.Series(row['Information'].split('-')), axis=1), left_index=True, right_index=True)

print(df)

Example with separate function.

    import pandas as pd

    def splitInfomation(information):
        ret = {}
        splits = information.split('-')
        for idx, split in enumerate(splits):
            ret['split' + str(idx)] = split
        return pd.Series(ret)

    df = pd.read_excel(open('/Users/xxxx/Downloads/ExampleforPygo.xlsx','rb'), sheet_name=0)

    df = df.merge(df.apply(lambda row: splitInfomation(row['Information']), axis=1), left_index=True, right_index=True)

    print(df)
Prince Francis
  • 2,995
  • 1
  • 14
  • 22
0

Updated the Answer based on your example file given, in your case the datafile is xlsx so, you have to do like below, You can use Just str.split method to get the Job done, i also used fillna in case whereas no values Just mark them None.

When using expand=True, the split elements will expand out into separate columns.

>>> import pandas as pd
>>> pd.set_option('display.height',     None)
>>> pd.set_option('display.max_rows',   None)
>>> pd.set_option('display.max_columns',None)
>>> pd.set_option('display.width',      None)


>>> data_xls = pd.read_excel("Example_data.xlsx", index_col=None).fillna('')
>>> data_xls['Information'].str.split('-', expand=True).head(30)
     0        1                    2            3                     4
0   us  EXAMPLE             article1   scrolldown            findoutnow
1   us  EXAMPLE             article1  scrollright                  None
2   us  EXAMPLE             article1   findoutnow                  None
3   us  EXAMPLE   payablesmanagement   findoutnow                  None
4   us  EXAMPLE  strategicpurchasing  scrollright                  None
5   us  EXAMPLE             article1    learnmore         profitmargins
6   us  EXAMPLE   payablesmanagement  scrollright                  None
7   us  EXAMPLE             article2  scrollright                  None
8   us  EXAMPLE  controlandvisibilty   findoutnow                  None
9   us  EXAMPLE             article1   scrollleft                  None
10  us  EXAMPLE             homepage     amexlogo                  None
11  us  EXAMPLE        profitmargins   findoutnow                  None
12  us  EXAMPLE             article3   findoutnow                  None
13  us  EXAMPLE             article1    learnmore    payablesmanagement
14  us  EXAMPLE             article2   scrollleft                  None
15  us  EXAMPLE             article3  scrollright                  None
16  us  EXAMPLE             homepage     readmore    payablesmanagement
17  us  EXAMPLE             article1         None                  None
18  us  EXAMPLE             homepage      homenav            findoutnow
19  us  EXAMPLE  controlandvisibilty  scrollright                  None
20  us  EXAMPLE             homepage      homenav    payablesmanagement
21  us  EXAMPLE             homepage       scroll            findoutnow
22  us  EXAMPLE             article3   scrollleft                  None
23  us  EXAMPLE             article1    learnmore   strategicpurchasing
24  us  EXAMPLE             article1    learnmore  controlandvisibility
25  us  EXAMPLE             article1   scrolldown            findoutnow
26  us  EXAMPLE             article1  scrollright                  None
27  us  EXAMPLE             article1   findoutnow                  None
28  us  EXAMPLE   payablesmanagement   findoutnow                  None
29  us  EXAMPLE  strategicpurchasing  scrollright                  None

Borrowed From @Jon.. to get the whole dataset along with your orignal ones & new ones included...

>>> data_xls.join(data_xls['Information'].str.split('-', expand=True).add_prefix('newCol_')).head()

        Date                                 Information  EXAMPLE_LinkedIn_SponsoredContent_Visits  EXAMPLE_LinkedIn_inMail_Visits  EXAMPLE_DBM_Native_Visits  EXAMPLE_SGCPB_Native_Visits  EXAMPLE_SGCBDC_Email_Visits  EXAMPLE_SGCPB_Email_Visit  \
0 2018-08-20   us-EXAMPLE-article1-scrolldown-findoutnow                                         0                               0                          0                            0                            0                          0
1 2018-08-20             us-EXAMPLE-article1-scrollright                                         0                               0                          0                            0                            0                          0
2 2018-08-20              us-EXAMPLE-article1-findoutnow                                         1                               0                          1                            0                            0                          0
3 2018-08-20    us-EXAMPLE-payablesmanagement-findoutnow                                         0                               0                          0                            0                            0                          0
4 2018-08-20  us-EXAMPLE-strategicpurchasing-scrollright                                         0                               0                          0                            0                            0                          0

   EXAMPLE_SGCBDC_Native_Visits  EXAMPLE_ConstructionDive_Email_Visit  EXAMPLE_ConstructionDive_PromotedStory_Visit  EXAMPLE_SGCPB_PromotedStory_Visit  EXAMPLE_SGCBDC_PromotedStory_Visit  EXAMPLE_ConstructionDive_Native_Visits newCol_0 newCol_1  \
0                             0                                     0                                             0                                  0                                   0                                       0       us  EXAMPLE
1                             0                                     0                                             0                                  0                                   0                                       0       us  EXAMPLE
2                             0                                     0                                             0                                  0                                   0                                       0       us  EXAMPLE
3                             0                                     0                                             0                                  0                                   0                                       0       us  EXAMPLE
4                             0                                     0                                             0                                  0                                   0                                       0       us  EXAMPLE

              newCol_2     newCol_3    newCol_4
0             article1   scrolldown  findoutnow
1             article1  scrollright        None
2             article1   findoutnow        None
3   payablesmanagement   findoutnow        None
4  strategicpurchasing  scrollright        None
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53
  • 1
    You can just use `data_xls['Information'].str.split('-', expand=True)` - no need for the lambda stuff here... – Jon Clements Oct 30 '18 at 04:24
  • 1
    Could probably even make it: `data_xls.join(data_xls['Information'].str.split('-', expand=True).add_prefix('word_'))` so you get named columns out of it plus the original ones untouched for output... – Jon Clements Oct 30 '18 at 04:29
  • Okay, and if you do a `pd.set_option('display.width', 2000)` and then `print` your example output DF - you'll find it'll fit more nicely to show here :) – Jon Clements Oct 30 '18 at 04:36
  • @JonClements. included all the points and padas display setting altogether, thnx again :-) – Karn Kumar Oct 30 '18 at 04:42
  • @pygo Thank you so much for looking into this! I am going to try this right now! :) – Tae Yoon Oct 30 '18 at 13:22
  • @pygo Hi! your code works perfect if I don't delete any rows. Is there a way that I can make your code work after deleting some rows? Here is my new post with more details: https://stackoverflow.com/questions/53066413/python-splitting-one-column-into-multiple-after-deleting-some-rows – Tae Yoon Oct 30 '18 at 14:29
  • @TaeYoon, let me have a look. – Karn Kumar Oct 30 '18 at 16:00