While this can be solved using regex, I opted for a Python solution, borrowing from here.
First, a helper function to determine if a string contains a digit.
def hasNumbers(inputString):
return any(char.isdigit() for char in inputString)
Second, a function to classify objects into your stated types.
def determine_type(x):
# check if x is a number
if isinstance(x, (int, float, np.int64, np.float64)):
return "int or float"
# check if x is a string
elif isinstance(x, str):
# check if the string contains a digit
if hasNumbers(x):
return "alphanumeric"
else:
# it's a string
return "string"
Note the edge case here of an int
or float
being cast a string. "5" is alphanumeric.
Let's make our dataframe, and convert whatever can be converted to a number, to a number:
"""
name emp_id design salary
0 Rick F912-1 Engg 4500
1 Monty 56Df1 Doc 6700
2 Chris J45-12 Arch2 7800.0
3 Julie Klgi Engg-A KL00
4 Mashy1 5670 Law_y 4100.0
"""
df1 = pd.read_clipboard().reset_index(drop=True)
df1 = df1.applymap(lambda x: pd.to_numeric(x, errors="ignore"))
display(df1)
Output:
name emp_id design salary
0 Rick F912-1 Engg 4500
1 Monty 56Df1 Doc 6700
2 Chris J45-12 Arch2 7800
3 Julie Klgi Engg-A KL00
4 Mashy1 5670 Law_y 4100
Now, let's melt
this dataframe into the format you requested, and apply our custom function.
# melt the df
df1_melted = df1.reset_index().melt(id_vars="index").sort_values("index")
# apply our function
df1_melted["type"] = df1_melted["value"].apply(determine_type)
print(df1_melted.head(10))
Output:
index variable value type
0 0 name Rick string
5 0 emp_id F912-1 alphanumeric
10 0 design Engg string
15 0 salary 4500 int or float
1 1 name Monty string
6 1 emp_id 56Df1 alphanumeric
11 1 design Doc string
16 1 salary 6700 int or float
2 2 name Chris string
17 2 salary 7800 int or float
The bulk of the work here was writing (hacking) a custom function to determine datatypes. I really recommend sticking with the built-in types, either pure Python or numpy
. If something is a string, then you can do further analysis.