Given a Series of strings in the following format.
["s1, s2, s3, s4",... ]
I would like to create a two colum dataframe like this.
[[s1, s2], [s2, s3], [s3, s4]]
Currently im stuck on ho i can go from a Series to a dataframe.
Given a Series of strings in the following format.
["s1, s2, s3, s4",... ]
I would like to create a two colum dataframe like this.
[[s1, s2], [s2, s3], [s3, s4]]
Currently im stuck on ho i can go from a Series to a dataframe.
I believe you need list comprehension with flattening with function window
for sliding window:
s = pd.Series(["s1, s2, s3, s4","s1, s2, s3"])
print (s)
0 s1, s2, s3, s4
1 s1, s2, s3
dtype: object
from itertools import islice
#https://stackoverflow.com/a/6822773/2901002
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
a = [y for x in s.str.split(',\s+') for y in list(window(x))]
print (a)
[('s1', 's2'), ('s2', 's3'), ('s3', 's4'), ('s1', 's2'), ('s2', 's3')]
df = pd.DataFrame(a, columns=['a','b'])
print (df)
a b
0 s1 s2
1 s2 s3
2 s3 s4
3 s1 s2
4 s2 s3
With some formatting caveats, you can reshape the Series values, per Reshape of pandas series?.
Note that I separated your s# elements into separate strings, and that the (2, 2) reshape only works for a series with 4 elements.
import pandas as pd
s = pd.Series(['s1', 's2', 's3', 's4']).values.reshape((2,2))
print(s)
df = pd.DataFrame(s)
df
Output:
0 1
0 s1 s2
1 s3 s4