3

There are other similar questions, but the difference here is that my dataframe already has a lot of columns, only one of which needs to be split.

I have a large dataframe(hundreds of columns, millions of rows). I would like to split one of these columns when a character ("|") is found in the string.

All values have only one "|".

For a fixed length I would do this: df['StateInitial'] = df['state'].str[:2]

I wish I could replace the 2 by string.index("|"), but how do I call the string?

Alexis Eggermont
  • 7,665
  • 24
  • 60
  • 93
  • Possible duplicate of [How to split a column into two columns?](https://stackoverflow.com/questions/14745022/how-to-split-a-column-into-two-columns) – LeoRochael Jan 11 '18 at 20:47

4 Answers4

7

How about:

df = pd.DataFrame(['a|b', 'c|d'])
s = df[0].apply(lambda x: x.split('|'))
df['left'] = s.apply(lambda x: x[0])
df['right'] = s.apply(lambda x: x[1])

Output:

     0 left right
0  a|b    a     b
1  c|d    c     d
santon
  • 4,395
  • 1
  • 24
  • 43
  • 2
    See also `pandas.Series.str.split()` [pandas.Series.str.split][1] [1]: https://pandas.Series.str.split – Martien Lubberink Nov 08 '18 at 21:45
  • 2
    Link doesn't work.. [new link](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split.html) – polka Aug 05 '19 at 19:10
2

Here is a one liner that builds on the answer provided by @santon:

df['left'],df['right'] = zip(*df[0].apply(lambda x: x.split('|')))

>>> df 
     0 left right
0  a|b    a     b
1  c|d    c     d
khammel
  • 2,047
  • 1
  • 14
  • 18
1

First, set you new column values equal to the old column values.

Next, create a new column with values initially equal to None.

Now, update the new column with valid values of the first.

df['new_col1'] = df['old_col']
df['new_col2'] = None
df['new_col2'].update(df.new_col1.apply(lambda x: x.str.split('|')[1] 
                      if len(x.str.split()) == 2 else None))
Alexander
  • 105,104
  • 32
  • 201
  • 196
0

If you have a column of strings, with a delimiter '|' you can use the following line to split the columns:

df['left'], df['right'] = df['combined'].str.split('|', 1).str

LeoRochael has a great in-depth explanation of how this works over on a separate thread: https://stackoverflow.com/a/39358924/11688667