1

So let's say I have data like this with some delimiter like commas that I want to split to new cells either across to columns or down into rows.

The Data Location
One Museum, Two Museum City A
3rd Park, 4th Park, 5th Park City B

How would you do it in either direction? There are lots of methods why is methods provided preferred?

Looking for methods in:

  • Python
  • Excel
  • Power Query
  • R
Shane S
  • 1,747
  • 14
  • 31
  • 1
    Why is this tagged `r`? – Rui Barradas May 12 '22 at 06:21
  • @RuiBarradas, Thank you for the question. I thought it might be nice to see how this is done in r. I am not an expert in r but if you wan to answer it in r that would be awesome. – Shane S May 12 '22 at 07:03
  • "How would you do it in either direction?" I would never do it to new columns, imagine a big city with like hundreds of hotspots, you do not want to have that amount of columns as the city with the most, while a small village just have a handful. Besides there would not be a purpose of the column itself. Always do it in rows. – Merijn van Tilborg May 12 '22 at 07:23
  • 1
    **R language:** Here is how to split column by delimiter into [multiple columns](https://stackoverflow.com/questions/4350440/split-data-frame-string-column-into-multiple-columns) and into [multiple rows](https://stackoverflow.com/questions/15347282/split-delimited-strings-in-a-column-and-insert-as-new-rows). – Rui Barradas May 12 '22 at 07:48

3 Answers3

3

The Excel manual method: click on Data>Text to Column. Now just copy and past if you want the data in one column. This is only good when the data set is small and your are doing it once.

The Power Query method: This method you do it once for the data source then click refresh button when the data changes in the future. The data source can be almost anything like csv, website or etc. Steps below:

1 - Pick your data source

2 - When within excel choose From Table/ Range

enter image description here enter image description here

3 - Now choose the split method, there is delimiter and there is 6 other choices.

4 - For this data I went with custom and use ", "

5 & 6 - To split down you have to select Advanced options. Make the selection.

7 Close & Load

This is a good method because you don't have to code in Power Query unless you want to.

Shane S
  • 1,747
  • 14
  • 31
3

The Python method Make sure you have pip installed pandas or use conda to install pandas.

The code is like so:

import pandas as pd

df = pd.read_excel('path/to/myexcelfile.xlsx')
df[['key.0','key.1','key.2']] = df['The Data'].str.split(',', expand=True)
df.drop(columns=['The Data'], inplace = True)
# stop here if you want the data to be split into new columns

The data looks like this

   Location key.0       key.1       key.2
0   City A  One Museum  Two Museum  None
1   City B  3rd park    4th park    5th park

To get the split into rows proceed with the next code part:

stacked = df.set_index('Location').stack()
# set the name of the new series created
df = stacked.reset_index(name='The Data')
# drop the 'source' level (key.*)
df.drop('level_1', axis=1, inplace=True)

Now this is done and it looks like this

    Location  The Data
0   City A    One Museum
1   City A    Two Museum
2   City B    3rd park
3   City B    4th park
4   City B    5th park

The benefit of python is that is faster for larger data sets you can split using regex in probable a 100 ways. The data source can be all types that you would use for power query and more.

Shane S
  • 1,747
  • 14
  • 31
1

R

library(data.table)

dt <- fread("yourfile.csv") # or use readxl package for xls files

dt

#                            Data Location
# 1:       One Museum, Two Museum   City A
# 2: 3rd Park, 4th Park, 5th Park   City B


dt[, .(Data = unlist(strsplit(Data, ", "))), by = Location]

#    Location       Data
# 1:   City A One Museum
# 2:   City A Two Museum
# 3:   City B   3rd Park
# 4:   City B   4th Park
# 5:   City B   5th Park
Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22